Software security expert Tanya Janca, author of Alice and Bob Learn Secure Coding and Staff DevRel at AppSec company Semgrep, joins Ryan to talk about secure coding practices. Tanya unpacks the significance of input validation, the challenges of trusting data sources, and the intersection of security and law. Bonus: what she learned trying to secure a Canadian national election.
Semgrep is an AppSec platform that lets devs deploy static application security testing (SAST), software composition analysis (SCA), and secret scans. Explore their docs.
Tanya is the author of Alice and Bob Learn Secure Coding and Alice and Bob Learn Application Security.
She’s also written for our blog:Three layers to secure a software development organization and Continuous delivery, meet continuous security.
Secure coding might be an issue of national security.
Follow Tanya on LinkedIn or check out her website.
Stack Overflow user Reishin earned a Populist badge with their answer to piping from stdin to a python code in a bash script.
[intro music plays]
Ryan Donovan Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, your humble host, and today we have a great guest, Tanya Janca. She is DevRel at Semgrep, she is a contributor to the Stack Overflow Blog, and she's also a published author. Today we're going to be talking about her book, Alice and Bob Learn Secure Coding. So welcome to the podcast, Tanya.
Tanya Janca Thank you so much for having me, Ryan.
RD Of course. So beginning of the show, we like to ask our guests how they got into the world of software and technology. So what's your origin story?
TJ So four of my five uncles are computer scientists and both my aunts are computer scientists.
RD Kind of a nepo baby.
TJ Basically, yeah. And so when I was little, I got a computer when I was eight when no one else had a home computer and then they taught me DOS when I was a teenager and then I learned programming. I think I was writing C when I was 16. And then I got my first job at a high tech company when I was 18, the moment I was legal. So you could say it's kind of the only thing I've ever been super good at, but I also have played music in my life too. So I've sort of had this side career and then main career of computer science for a long time.
RD I found there's a pretty good overlap of computer and musicians too.
TJ It's true. I think creating music is quite mathematical, if that makes any sense.
RD Absolutely. Today we're going to be talking about secure coding and your book, and it's called Alice and Bob Learn Secure Coding, and I know that's an old trope in computer science and formalized logic. Can you talk about why you chose the Alice and Bob paradigm?
TJ Yes. So the year I was born, Alice and Bob were born and Alice and Bob were used to explain cryptography to normals, to people that aren't mathematicians, essentially. And so Alice wants to tell Bob a secret. How can she make sure that only Bob learns the secret? How can Bob be sure that the message is truly from Alice? And I remember in my co-op interview –which is an internship in Canada, we call them co-op– when I was maybe, oh gosh, 19 or 20, them explaining that you're going to test an SSL accelerator and this is what SSL is –which we don't use anymore, we use TLS now– but explaining with Alice and Bob how the things went back and forth. And so when I went to name my first book, all my blogs and all my talks, when I gave examples I would say, “Oh, it's not that Alice is malicious, it's that she had to get her job done and that's why she broke this policy. It's not that Bob makes lots of mistakes. It's that there was no security safeguard there to protect them.” And I kept using them as examples, and so when I was naming the first book, I was waffling between, should I name it Alice and Bob Learn Application Security, or should I name it The Application Security Handbook, which sounds very serious. And my publisher is like, “Your book's weird, and it's silly and fun and there's parts where you laugh out loud which is not usual for a textbook and we don't think you should have a super serious name.” The name is boring– Application Security Handbook. So we went with the more creative name and so that is where Alice and Bob came from.
RD That's awesome. Let's talk about the other part of the title, the secure coding. For folks who are wondering what is secure coding, what are the things that you have to think about for that?
TJ Well, to simplify it, it's the idea of making sure your code is the most secure and safe that it can be so it is defensive in nature generally, or you would use, for instance, a newer framework that has better security features rather than using an older framework or a framework that has fewer security features. You would choose, for instance, a memory-safe language over a language that is not memory-safe. You would use functions within the language that you know are safer rather than ones that aren't as safe. So for instance, before we started recording, we were talking about C briefly. In C, they realized very quickly because it's not memory-safe that people could overflow certain functions. So if you're doing string manipulations, integer manipulations, etc, and when you overflow, you go out into the stack or into the heat depending if you're really creative, and this causes problems. So they came up with functions that were safer to use that were less likely to have those things happen. And so if you're going to write secure code, you're going to try to use the safer functions or make sure that your code itself can defend itself. So for instance, if you get some data from the user, for instance, I don't know if you write a giant site that gives coding advice, you have to accept a lot of very dangerous characters. I was explaining how Stack Overflow is the most difficult input validation scenario of all time because you have to accept tons and tons of arbitrary code.
RD Yeah, but you don't have to run it.
TJ Yes, that's true. Ideally you're not running it.
RD Ideally, yeah.
TJ But so how to handle that safely, that's the idea of secure coding.
RD We did a little coverage a few months back, the White House said don't use memory-unsafe languages, use Rust instead of C basically. Do you think that sort of secure coding memory safety is that significant as a national security issue?
TJ So I have some opinions on this. So we can't afford to rewrite every single C program into Rust, that's just unrealistic. And the Rust ecosystem isn't as rich yet as C and C++. There's just not as much to offer yet. I'm hoping that changes. And then we also have tons and tons of people that know how to write C that don't know how to write Rust yet. So there's this whole ‘that's going to cost a fortune’ that's not realistic to just say, “Thou shalt not do this anymore,” just like thou shall never have sugar ever again or soda or anything delicious because we should all be perfectly healthy. Should we choose sugar less often? Yes. If we're going to write a brand new app and we're choosing between Rust and C, I would love to get people to use Rust or just something that is memory-safe. Mozilla released an article, I think in 2021 or 2022, where they rewrote their browser and a bunch of other things into Rust from C, and they said it removed something like 73% of their bugs. That is huge, and memory-safety bugs are specifically rather terrifying because it affects a bunch of different things. So if you have a memory safety problem, it can make your program crash, so no availability. Security folks and developers, we do not like our apps being crashed, that's very bad. But on top of that, if you can manage to overflow into the memory, you can overwrite the stack pointer and attempt to overwrite it with your malicious code so that, like we were talking about earlier, it executes your code instead of the nice Stack Overflow code or Tanya's code or whoever actually wrote this app, and then you can try to take control of the web server, which is a very worst case scenario. Remote code execution or RCE is the name of that situation and you don't want that, and so those two things, lack of availability, having your app act in ways that you can't predict, and/or takeovers, obviously bad.
RD And I know that there's some super developers out there that rankle at some of the restrictions. They like C because of the manual memory allocation. They like the manual garbage collection. Are there ways to be safe without putting handcuffs on them?
TJ Absolutely. So you can program C really safely. You need to do a few things. So first of all, I would suggest a secure system development lifecycle. So you want to make sure that you have requirements upfront of how to do garbage collection, how to do this, how to do that, and you want to make sure you do it the same way, ideally, in every single app that you're going to do so that it's very easy to review the code later. You want to make sure that you've had some training on how to write secure C. That would be really helpful so you know you should use these functions over those functions. If you're doing this, there's some risk here, so you should handle it this way. So if you have some training on it, then you can still have the freedom to manage things all the way you want to, and for instance, something that's really, really, really efficient and fast and low level, but do it safely. I would also suggest using a static analysis tool that specializes in C and C++. Some of them are better than others. So I work at a company that sells a stack analysis tool, and I'm going to tell you no matter which one you get, you want to try it out. You want to make sure it has really good coverage for the language that you're purchasing it for because if you're purchasing it for C, they're not all good at that, and I'm not going to claim that we are the best. I definitely want you to get the best one. If all you're doing is writing C and C++ all day, test it out very carefully. I would also say that there are some books that I could recommend, and I do recommend them in my book, that dive really deep into C. I'm thinking of my bookshelf behind me. There's a book by Gary McGraw that's really good that I can't remember the title of right now but I can get it for you for the notes. If you want, I could recommend two books. There's Gary's book and then I forget the name of the other book. I think it's Robert Pearl, but I'm sure I'm getting that wrong because I'm on the spot, but it's literally just hundreds of pages of writing secure C. I only wrote something like five or six pages about writing secure C and C++, so there's a lot to say there and he has labs and stuff as well, which is really cool. So I would definitely suggest reading, I'm pretty sure it's Robert's book and then Gary's book as well.
RD Okay. So if you only have the five or six pages on the secure C and C++, there's a whole other world of things to consider. What are the other things to consider? Even if you have a good language with great memory management, great garbage collection, what else is going on?
TJ Okay, so there's so much. So first of all, the most important thing I think once you've chosen your language, you've chosen your framework– so I'm hoping you're going to choose a more modern framework, a more modern language if you can, because by default there's more security there or there's more security features. But when you start writing code, input validation is number one. So when I say ‘validation,’ I don't mean sanitization. I mean a series of steps that I outline in the book but I also talk about constantly and I'm very annoying about. So I think that we should validate, whatever it is we're getting is what we're expecting. So if you're expecting a date, it should really be a date. If you are expecting a date, it should be in a certain format. If it's a date of birth, it should be not probably 500 years in the past, it should not be in the future. That makes no sense. So it's not what we're expecting. If we can make sure whatever it is we've gotten is within the realm of what we're expecting, and whenever possible, if we can have an approved list of things or an allow list. So for accepting URLs, for instance, there should be domains that are okay and domains that aren't. And if you can compare it against a list of what you know is known to be good, you're so much safer. And then once you have it, then either escape or sanitize special potentially dangerous characters. So if you're going to a database, single quotes can be rather dangerous, or hyphens for instance, and so you would either escape those or sanitize them out. It's your choice. I prefer escaping, lots of people prefer sanitization. As long as your whole project team agrees on what you're doing, you should be okay. So if we could get those two things right, I think that we would be able to eliminate a lot and a lot and a lot of vulnerabilities. I don't think there'd be injection, I don't think there'd be cross-site scripting, server-side request forgery could go away now. There's just so many things that wouldn't happen if we validated everything that we receive. And when I say ‘receive,’ I also mean you're getting stuff from a database and maybe it's stuff that was from the user. We don't trust that. I'm calling an API, maybe I don't trust that. I'm sending something across the wire and it's not encrypted. I don't trust that.
RD There's the classic XKCD comic about Bobby Tables sanitizing your inputs, but this sounds like going a step further. Are there easy ways to get a list of good URLs? What's a list of valid birthdays?
TJ This is something that you would talk with your business representative or whoever is your product owner who's defining these things. So it's like, “Oh, you want a date of birth. Cool. How about -150 years from now till yesterday? Does that work for a date of birth?” And they'll probably be like, “Sure.”
RD I mean it works for COBOL. That's how they do the dates.
TJ Oh my gosh, I remember learning COBOL back in the day. Oh, my.
RD So in terms of sanitizing validating inputs, are there things you can trust? Can you trust internal RPC calls? Can you trust internal data sources?
TJ It depends, which is the most annoying answer that anyone can give. So if I'm going to go take something from the database, let's say– so I'm from Canada, you'll probably notice my accent when I'll say ‘a’ or ‘about’ at some point. I don't think I do it but then I see myself doing it on video, so it's true. Anyway, so our provinces don't change names very often. So in my lifetime, we've had two changes. So two changes in 45-ish years, not bad. If we look at that, that table is not going to change very often so we can hard code that table. So when I worked in the Canadian government, we'd have one table that was provinces and then all of our apps would pull from that and we would know it was safe. But let's say there's a table where I'm getting a user from the public to fill out a form and then I'm saving that data in there, and then later that exact same app is going to pull the data out. So if I validated it before I put it in there, I will trust it when I take it out. However, let's say I am grabbing stuff from the public and I'm saving it into a database, and then you are calling an API from your separate company from there. I wouldn't trust it. I would be like, “I don't know that Tanya lady.” I would validate first that it is what I'm expecting. And usually when we have input validation problems, sometimes it's an implementation issue, so they've made a block list, things that are bad that they shouldn't accept, which is really, really hard to get right, or it's that they've forgotten it altogether and trusted something they shouldn't have trusted. They're like, “Oh, but that's an API from within my company,” and I'm like, “Yeah, but it takes data from a customer's database or it's taking it from this public place or it goes through here and it's unencrypted,” you know what I mean? So if there's a chance that someone else has changed it, or they're like, “Oh, it was in a hidden field,” that's not hidden from me. Anyone with a copy of Burp Suite, it's not hidden from them. Or, “Oh, it was in the URL parameters.” No, no, no, not safe. Not safe. Well, when I was a dev, because I switched to security so long ago, I remember in 2011 being like, “How dare you change my URL parameters to the pen tester?” I was like, “This is blasphemy. Those are my parameters, not yours.” And he's like, “They're right there. I could just change them.”
RD I mean, that's an interesting, anything you give to your client, you can no longer trust it. Any client-side code, I can go in and change that on my system.
TJ Yes. I remember when I first started learning about security, I went to capture the flag contests to try to learn. And I'd go and I'd always be the only woman everywhere I went and it was really frustrating, so I made a team of all women to go with me and we wore cute party dresses this time because we thought that'd be fun. And so I remember showing them, “Oh, this is how you do an SQL injection on this login screen,” and explaining, “Oh, this is how I get around this and this is how I get around that.” Basically I just threw up a web proxy, we went past the input validation and the JavaScript, I changed it to what I wanted and then we got in and one of the women just turned white. She's like, “I need to go to my office right now,” and she said she stayed all night just checking every app to make sure that that didn't work. Just so worried. And she's like, “We're okay,” but good for her.
RD I mean, it sounds like a lot of the stuff you've learned and other people have learned has sort of been being on the other side of it, trying to break it and seeing other people break things. What was the most surprising jailbreak that you've seen?
TJ Oh, still to this day, the very first time I saw SQL injection it just blew my mind. So basically I used to run a community of practice for my developer team, and I met this pen tester because I was a musician playing all the bars at night and he was too so we became friends. So then he kept speaking for my community of practice and so he took one of our apps and then he did an SQL injection and got past the login screen, and he's like, “This is only going to take a minute or two. It's going to take that long because I'm talking.” And I was just like, “What? No, that's not going to work. That's my baby.”
RD Not anymore.
TJ I know. My baby’s so ugly. And so I ended up becoming his apprentice and learning from him. And in the book I explain how we broke up quite not very nicely and I learned a lot about trusting and not trusting people and who to learn from and learning what not to do about ethics. And so I talk a lot about ethics in the book because I feel like you can go and learn a lot of really malicious things on the internet very easily, and having someone teach you that doesn't have good ethics can result in you ending up in jail someday. It can result in you not finding work ever again if you do some things you shouldn't, and if you're learning from the wrong people, that can happen. So I talk about that a fair bit in the book because I almost got led down the wrong path.
RD And I don't think you even need to do it yourself. I heard a story that the guy who created cURL was on a no-fly list because cURL was included in various malicious programs.
TJ Yeah, but cURL is not made for that.
RD It's used by malicious actors.
TJ Yeah, but that's like saying malicious actors use FTP and therefore. People use APIs for malicious uses, we'll just never use APIs again. I don't know. It's the tool, but cURL, come on.
RD But in terms of the intersection between computer security and the law, it's not always well understood by the folks who make the law.
TJ Yes, and that is actually quite disconcerting that you have people making decisions that aren't specialists in that area, and if you're not able to communicate clearly enough, you can't get the message across. I'm quite concerned about things like that ever since the case with Joe Sullivan. He was a chief information security officer who's had charges pressed against him for decisions made at work, and I'm not going to comment on the rightness or wrongness of that, but it's terrifying to think that, because I was the CISO for the election in Canada in 2015, and I did all the best things I could and I couldn't imagine going to jail for doing my job, assuming I have not knowingly broken any laws. It's kind of terrifying, but I do think we should be accountable for the work we do, especially if you're the decision maker, but it has definitely caused a lot of conversations and new decisions. It definitely makes me never want to be a CISO again, although I already didn't want to. It is so hard, Ryan. It is such a hard job. I do not want that job.
RD I mean, it's all defense. You have to be right every time. Whoever's coming at you has to be right once.
TJ Yes, and also you have to constantly battle with the business for funds and for permission and you have to defend every single thing that you do. I remember the chief electoral officer, he has to monitor the election, and he's like, “You keep blocking all these sites on me. I want you to turn off all the web filtering.” And I was like, “Sir, I cannot do that because you are the number one target, and here's the reasons you're the target, but I can give you a one-hour or less turnaround time. I'll have it go straight to my team and no matter what we're doing, we will get back to you within an hour, except for overnight where you have to wait till the morning.” And he's like, “I agree to your term.” And then later he told me, he's like, “If you had said ‘yes,’ you were fired.” Because it's hard to say no to your boss's boss's boss's boss.
RD Sure. I mean, the head of the thing that you're doing. But it's incredibly valuable to be able to disagree with the higher-ups. I remember at a previous job, because of resignations I ended up reporting to the CTO and she was a very sharp-minded woman, but anytime I disagreed with her, she'd be like, “Yes, this is what I want, but here's why you're wrong.”
TJ That is very helpful though. I feel like when I switched to security I had to learn how to communicate a lot better because I could just write code before that did the thing I wanted and then just show my teammate and they would get it, if that makes sense. But then it was like, “I have to use my words,” and I wasn't as strong with words at first and I had to learn about persuasion, I had to learn about negotiation, I had to learn a lot about leadership because I was a dev and then I was a senior dev and then I tried management a few times and I just hated it and kept going back down to technical positions and then I switched over to security and I was like, “I really have to learn to talk to people. I can't just raise my voice.” Because I remember getting into a fight with a bunch of developers who were editing code live on the production server and that was the only copy, and I was like, “You have to use version control,” and they're like, “Why? Version control is stupid.” And this was 2012.
RD Wow.
TJ And I was like, “Because it's industry best practice and we're not stupid,” and my boss was like, “You can't just call everyone stupid.” I was like, “I was saying we're not stupid.” He's like, “So go home and come back tomorrow and use your words.” And I had to teach a little lesson about why version control was important and how we were all going to do it or else Tanya is going to breathe fire on everyone.
RD It's explaining why this is actually bad, because I think there's some devs who will hear you say ‘these are industry best practices’ and they'll be like, “Why? Drill me down to first principles.”
TJ Yeah, exactly. And with security, you have to be able to explain the risk and why it matters because they have 20,000 other things they're supposed to be doing and they have hundreds of other bugs they're supposed to look at. Why are they looking at mine? And I remember I was working at a place that will remain unnamed and we had something big happening, something big that was coming, and I was like, “I need to get as many security things in place before the big thing happens and we launch all this stuff.” And everyone kept pushing on me and they're like, “Well, I have all these other deadlines. No.” And so what I did was I invented the risk sign-off sheet, which is just a bunch of bull, but basically I was like, “These are the things that I absolutely need fixed before this deadline, and here's what could happen to our citizens if we don't, and here's what could happen to our employees if we don't.” And I got very, very specific and I was like, “This is a scenario we do regularly and this could easily happen if someone did this, and this is what the news would look like.” And then I gave it to the chief security officer and he was like, “I'm not signing this!” And I remember him slamming it down on his desk. He's like, “No one would ever sign this.” I'm like, “Yes, but I've sent it to you via email so you are aware of it, so if you don't sign it, you've still accepted the risk by performing an action.” He's like, “Ooh.” I was like, “I need you to give me the authority to get people to fix this.” So he's like, “You tell them Bill sent you,” and everyone was like, “Bill!” and then everyone fixed the things and it was great and then we launched all of our stuff and it went really well and we were not in any newspapers. There was no breaches and no incidents and it was great.
RD I mean, that's sort of making sure people know whose butt is on the line is an important part of arguing for security. And we've seen a lot of folks trying to push the security development, shifting it left into the security development lifecycle. I mean, obviously I think we probably agree that that's a good thing. Do you think it is a possible thing for all the organizations?
TJ I do, depending upon how you do it. So I think we should start small and add one or two things to our SDLC. So first of all, you have to make sure you're actually doing a system development lifecycle. So the developers, for instance, that were just editing code live on prod that had no version control, they were not following a system development lifecycle either for their projects. I remember them explaining we're doing agile, that's why there's zero documentation and no meetings, and I was like, “That's not what… The agile people might disagree with you.” They're like, “Stop making agile a dirty word.” So first, and especially ideally, you're all doing the same system development lifecycle, and then adding security steps, especially the ones that you think will help the most the fastest. So for instance, I'm a big fan of having some requirements. So for each web app, you would have certain requirements. For an API, you'd have certain requirements. So every different technology, we know if we're doing IOT devices that we're going to need, because there's a physical device, we're going to need physical security and we're going to need digital security, and so how do we do this? And so if you can start with a list of what you want, you're a lot more likely to get it because what a lot of security folks do is tell you at the end how wrong you did it rather than setting you up for success at the beginning. I am a big fan of threat modeling. Threat modeling is, I like to call it ‘evil brainstorming.’ You get together and you look at the design and you're like, “What are we doing and what could go wrong?”
RD A little premortem.
TJ Yeah, exactly. And then you try to fix things so that they do not happen. So if you could find one thing to do during each phase, I think that could be really helpful. For the coding phase specifically, that could be secure coding training, that could be implementing a peer review of the code, it could be using a stack analysis tool to look for certain things. You can have a stack analysis tool that's in your IDE that does little kind of red squigglies that’s like, “Hey, that function's not so good. I'd really like it if you use this one instead.” I find in the IDE I like that a little better than later when I'm trying to pass the testing phase. I don't know, I like controlling things so I'd rather have control, but that's me. And so if you could have one thing during each phase of the SDLC, I think that you're going to release way, way, way, way better software. But if you're a really small shop and you can just do one thing, pick the thing that you think is easiest to implement and that you will actually do. So for instance, some companies, I've seen them where they're excited about threat modeling and I really like it, but they don't have enough people to actually do a threat model for every single project, and so then they're this giant bottleneck and then they're just causing problems and slowing things down. We don't want to slow things down unnecessarily. Yes, a threat model will take some time and then you'll look at the design and you might want some changes, but those changes are an improvement in the final product versus ‘I'm just waiting weeks for you because you don't have any time because you agreed to do this process and you actually can't support it at all.’ So I think it's really important to make sure you can support whatever it is that you decide, and so that's why a lot of people tend to use a lot of automated tools like stack analysis or secret scanning or whatever it is you've decided to focus on.
[music plays]
RD All right. Well, ladies and gentlemen, we are at the end of the show– the part of the program where we shout out somebody who came onto Stack Overflow and dropped a little knowledge, shared a little curiosity. Today, we are shouting out a Populist Badge– somebody came onto a question and dropped an answer that was so good it outscored the accepted answer. And today we're shouting out Reishin for dropping an answer on: “Piping from stdin to a python code in a bash script.” So if you're curious about that, we have an answer and I'm sure there's security issues around doing that too. I'm Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you liked what you heard today, drop a rating and review. And if you would like to reach out to us with suggestions, guest mentions, comments, concerns, email us at podcast@stackoverflow.com.
TJ I'm Tanya Janca. I'm Staff DevRel at Semgrep. I also do secure coding training on the side, and I'm SheHacksPurple, so shehackspurple.ca is my blog and everything else. My new book, Alice and Bob Learn Secure Coding, is going to have monthly free online lessons just like the last book. And so if you sign up for my newsletter at newsletter.shehackspurple.ca, which is free, I will invite you. I'm not sure if I'm starting them in April or May or June, because I'm waiting to see how many people get their books because I don't want to start until everyone's got the first round of books, and there are about 2000 presales, so I need to make sure all those people get their books and then we will start. I'm going to record them, so if you miss them, it's okay. It'll be on my YouTube channel. If you miss one, no problem. But they're going to be free and open to the public and I'm going to have experts on for every single chapter to discuss all the content of the chapter, the questions at the back of the book, and answer any questions the audience has. And so if you bought the book, get the free lessons.
RD All right, everyone. Thank you for listening, and we'll talk to you next time.
[outro music plays]