The Stack Overflow Podcast

Feature flags: Theory meets reality

Episode Summary

Ryan is joined by Fynn Glover (CEO) and Ben Papillon (CTO), cofounders of Schematic, for a conversation about managing feature flags in software development. They explore theoretical and practical applications of feature flags, the issue of tech debt, and how orgs could manage entitlements and pricing models more effectively.

Episode Notes

Schematic offers SDKs for packaging, pricing, and entitlements. 

Check out Ben’s article on feature flags

Listen to Bill Tarr from AWS and Brian Rinaldi (then at LaunchDarkly and now at Localstack) talk about the opportunity to extend feature flags beyond deployment and rollout and into entitlement management and monetization.

Find Fynn on LinkedIn.

Find Ben on LinkedIn.

feature flags, software development, technical debt, business strategy, product management, feature management, DevOps, software engineering, pricing models, entitlements

Episode Transcription

[intro music plays]

Ryan Donovan Hello, everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, I edit the blog here at Stack Overflow, and today I'm joined by two great guests: Fynn Glover, CEO and co-founder of Schematic, and Ben Papillon, the CTO and also co-founder of Schematic. We're going to be talking about feature flags– how to manage them on both a business and technical level. So gentlemen, welcome to the podcast. Glad you're here. 

Fynn Glover Thank you so much for having us, Ryan. 

Ben Papillon Thanks for having us. 

RD Of course. So at the top of the show, we like to get a little info about our guests. How did you get into software and technology? What's your origin story? 

FG Ben, do you want to lead or do you want me to? 

BP Sure, I'll jump into it. I've been kind of a startup journeyman for the better part of a couple decades here. I joined a startup in 2006 as a high school intern, a bunch of Virginia Tech students that were building an email product, and I kind of got bit by the startup bug doing that. I went to college and just kept studying computer science and learning how to program and doing different startups, and here we are, 20 years later almost, just still working on B2B SaaS. So I've been through a variety of different products throughout there, but generally have stayed fairly consistent as kind of an early stage engineering kind of persona on products like that. 

RD And Fynn, do you want to explain yourself, young man?

FG I guess ‘journeyman’ is the right word. I've been a startup journeyman as well. I founded my first company in 2012 and it originally was kind of a lifestyle business, and about four years in we saw an opportunity to pivot this services company into a software company. And when we made that decision, we needed somebody who could lead the company technically, and that sort of is how I eventually got introduced to Ben. This would have been 2016 or so. He was living in Austin, Texas. I was in Chattanooga, Tennessee. For some reason, he agreed to come to Chattanooga and join that company. We ended up raising a Series A, moving the company to Atlanta. That company ended up being acquired a couple of years later and then Ben went to another growth stage software company and I went to another growth stage software company, but behind the scenes at night, we were trying to figure out what's the next business to start together. That led to what we are building now, which we are happy to get into, but we founded this company about 18 months ago. And so now I think Ben and I have been working together for the better part of 10 years on a few different companies and a bunch of nighttime projects that never got off the ground, but this one has gotten off the ground. 

RD Okay. Well, let's talk about that project. The feature flags, I see articles about how hard it is, the issues with it, whether you should run experiments with feature flags, all the things. What is the problem with them? Why is it so hard to get feature flags right? 

BP I think there's a lot of reasons. I think there's kind of a theoretical background of how feature flags are supposed to be used and then kind of an in-practice what ends up happening, and there's a big gap there. So basically, the premise with feature flags is they should mostly be used for operational purposes, like you're releasing a new feature but not all the pieces are done yet, so parts of it stay behind the flag until it's done. Or maybe you're ready to release it and you want to do a slow rollout, so you use the flag to do your slow rollout. A/B testing is another one that starts to kind of get out of the scope of DevOps or CI/CD or operational use cases into more product or kind of business-adjacent use cases. That's a common one– experimentation. But generally, the rule of thumb you see a lot is that these should be short-lived, keep the flag in your codebase for as long as it takes you to get the feature fully rolled out and then you should kind of go back and tidy it up. And if you don't, that would be considered technical debt. And I would say the main reason for that premise is, one, they seem like such a common constant presence now, but feature flags did kind of emerge as part of the DevOps trend in the mid-aughts. And obviously just checking a flag in your codebase was a thing before that, but as kind of a tool, it emerged during that period, especially with LaunchDarkly and companies like that getting traction. So part of it is just the fact that it comes out of that DevOps world and so those are the use cases that got a lot of adoption initially for feature flag services. The other part of it is just that if you have a bunch of flag checks in your codebase and you don't actually need them, then either you have a lot of untested code paths in your codebase or you have a lot of tests to write, you know what I mean? So that's the downside of leaving a lot of flagging in your codebase, but my experience, especially with fast-moving software companies which is hopefully most of them, is that there tends to be not a lot of appetite to go back after to clean up post facto work with your features. And just sometimes it's not always made clear to the engineers when we are actually done with this feature. They might finish all of their work with the feature, put it behind a flag, and then maybe there's another team that manages the rollout, and there's never a moment where they go back and say, “Okay, you're cool to remove this flag now. We're done with it.” There's just sometimes that disconnect, but for that or any of several reasons, flags oftentimes don't get cleaned up so you end up with this sprawl regardless. Flagging tools will often have capabilities in them to help with this like a last checked date. If that's still getting checked, that's a red flag to somebody to go be like, “Hey, we should remove this flag,” or vice versa, but it's a fairly unsolved problem for the most part. That's one thing that's very hard. That's what we sometimes think of as ‘zombie flags.’ 

RD I was going to say, what is the actual issue with leaving this flag in there if there's no actual feature behind it? If there's still a flag in the codebase that's checked, and when it's checked, it actually does nothing, does it cause any harm on either a product or tech level? 

BP I think the harm caused is generally just added complexity. There's so much that engineers have to worry about already that having unnecessary things to worry about is just kind of a cognitive load cost thing that if you don't need it, you shouldn't have it, which is kind of the argument I just made about the excess code paths. But there's a flipside, too, which is that maybe the flag has been removed and now it's just cruft in the FM tool. The flag just shows up in LaunchDarkly or Schematic or whatever tool, but flipping it does nothing. And so that has some harm in the sense that I've been in situations where let's say a product manager said, “I'm going to give this feature to this user,” flipped the flag, said, “Hey, user, you're all set,” and then the user said, “You know what? I'm pretty upset that you said I got this feature and I didn't get it.” The product manager is confused, goes and talks to the engineer, and it turns out the switch does nothing. It's like pushing a button trying to cross the road and it's not plugged into anything. So there are some harms there. They're kind of all within the context of communication and unnecessary complexity, I would say. 

RD So on a sort of business consideration side, is there a way of thinking about feature flags that can minimize the eventual sprawl, the eventual tech debt from them? If you're only using it to differentiate product tiers, is that a reasonable way to think about feature flags, or is that missing out on the full power of it? 

BP I think it's a reasonable way. What I would argue and what we would probably argue as being the Schematic perspective in general is like this case I've been making for feature flag sprawl being a problem is something that we should maybe just accept. It's kind of like the Planet Fitness approach to feature flags. It's going to be this way regardless, it's just kind of that you're going to have the sprawl. And so a better way of looking at it is probably, “Let's make sure you have the tools and the kind of practices with how you approach feature flags that you just kind of see it as being okay to have long-lived flags in your codebase,” because not only does that mean you don't have to feel bad about it, but also it means you can support use cases like you mentioned like entitlements and plans and things like this, which are long-term feature flags. So it's kind of a dodgy answer, but I would almost say that you should set yourself up so you're okay with it, I think. 

RD You should accept what pains are going to be there at the beginning? 

BP And there's a lot of things you can do. Let's say you're thinking about it in a UI context, maybe you have a component-based system like React, you could have a wrapper component that manages a lot of this conditionality as being common and include that in maybe your routing system or something like that so that it's not necessarily a sprawl of a bunch of if statements throughout your front end, it's more of a consistent pattern. Similar thing with if you're using it for maybe a rate limiting or API access use case. Do it with a middleware so it's more of a setting that says, “Check this flag, don't check this flag,” that you could include in a config. Things like this where you kind of think of feature flags, or I do sometimes, as being like if statements as a service, but in practice, anytime in code where you see lots of if statements all over the place, that tends to be something you see as a smell and you probably look for patterns to try to make that code more legible. This is no different really, it's just the idea of preparing yourself ahead of time with your architectural choices to think that way with regards to feature flags as well. 

RD This is almost a way to not do multiple builds of a product in a lot of ways, to be like, “I'm just going to compile once, have it out there. I can manage all the tiering experiments, the slow rollouts.” I can manage user access control and have it all, like you said, a bunch of nested if statements. But at what point do you say that maybe you're trying to do too much with feature flags? At what point are you like, “This is kind of a wonky architectural design. Is there a better way to do this?” 

BP It's a great question. I think you could hit that wall in a few different ways. I think for one thing, definitely this rat's nest of if statements kind of approach could be an area where it's like, “Okay, the feature flags are maybe fine, but the way we're using them is making it really hard to test our product.” Because if you are doing this mindset of ship one build, and then there's tons of options on it that can be altered at runtime, which is effectively what you're doing when you use a feature flag system that runs at runtime, you now have so many combinations of things that in theory you should be testing before you release a change. So taking a more systematic approach like having a dedicated middleware that does the checking or a dedicated component layer in your router that does the checking, that can sometimes make it easier to kind of have patterns in your tests that manage your combinatorial complexity of that. Another way you could hit the wall is maybe just having so many flags and not having an organizational system around them. Some flags might be meant for an A/B test that somebody did six months ago and has since moved to another team and doesn't even care about it anymore. This next flag might be super critical because it's actually gauging access to a feature that's on your pricing page, people pay for it. There are some downsides to having different use cases for flags that manifest potentially in identical interface in your codebase. But even that, in our own codebase we of course use Schematic to manage our own flags, we use kind of custom hooks around that so that certain features of feature flags, for example, if it's an entitlement, we have a certain React hook for that so that we can say, “Okay, let's pull out usage of this. Are they over a limit,” different states and things like that? Whereas if it's just an operational flag, we will just use it. So to some extent, we've chosen in our case to have our call sites look different depending on the use case of the flag, but it is at least a unified service that's managing all of them, which as your question implies, can be an advantage and a disadvantage, depending on the context. 

RD You talk about the sort of combinatorial complexity of all these flags. Do you think it's better to have the flags sort of individually own features, or is it better to have sort of larger flags? Maybe these larger flags flip other flags down the line. 

BP So you're thinking the kind of dependent flags? Actually there's a pattern– people talk about keystone flags, I think, where essentially it's dependent flags where you need one flag in order to have another flag on. Almost another way of thinking about it would be targeting inheritance potentially could be a way of looking at it. That's what you mean? 

RD Whether you do the sort of inheritance or you have a sort of massive super flag that's like, “This is just all the enterprise stuff all at once.” 

BP Ideally you would have a massive super flag for an atomic unit of value. Just to use the entitlements and plans use case as an example, you might have on your pricing page that you get 10 widgets per month with this plan and 20 widgets per month for the next plan. There might be 90 different places throughout your front end and your back end of your different code bases where you need to check different things related to this which it's featured, so you might have potentially different flags under that heading related to maybe soft limits, hard limits, things like this. Again, it depends on your tool because some tools will have all these capabilities built into a single flag that has states to it, but ideally you would have one feature that's the feature flag for widgets. But if you use a supermassive flag, if that's the technical term, for the entire enterprise plan versus the basic plan, you've kind of robbed yourself of some flexibility. Ideally, the purpose of having a runtime configuration system like a feature management tool is to have flexibility so that if you want to say, “Oh, the enterprise has always exclusively had access to widgets, but now the basic tier is going to get that too because we're trying to expand our user base,” ideally you'd be able to do that with the feature management tool, not with code changes to the codebase. So it does take some planning to decide what's my atomic unit of value that I want to gate in this way, but that would be my recommendation. 

RD Fynn, I want to get your sense of what you see people using otherwise to manage this in the market. I could see a big file of hardcoded Booleans or whatever as the simplest version, but I'm sure there are complex ways to manage all of this. 

FG I'll speak kind of broadly about the market for a second and then I'll just speak to kind of how we started to get really opinionated about this space. But Ben mentioned this– our point of view is that LaunchDarkly in many ways created this category of outsourcing feature management 10+ years ago. It's now become a relatively ubiquitous primitive that everybody is familiar with. What gets sort of interesting and maybe what's underappreciated, I think, by a lot of people is the complexity that entitlements can effectively place onto businesses as they grow, and often what that means is that a business achieves product market fit with a simple product but eventually it evolves to support a variety of go-to-market motions, maybe a variety of billing models like seat-based or usage-based, and eventually, of course, multiple products too. And so it's quite a bit of dimensionality to support from a pricing and packaging perspective, and a lot of that complexity often falls on product and engineering to go figure out how to support. The business comes back to product and engineering and they're like, “We want to price and package and sell like this.” What Ben and I and our other co-founders learned is that often what people were doing to try to handle this is they were hardcoding entitlements into the application code and then tying them to a plan ID in the billing system, and that was the most common approach. And of course that's really easy to do early on, it's really quick and dirty. But over time, as you go on that journey, that creates a lot of– you might call it technical debt, but you might also call it commercial debt, because now the business can't evolve pricing and packaging without asking a team of developers to go figure this out. The other things that we saw people do is they would try to use feature flags, but kind of the core vendors in the space didn't really have a first class citizen in their products for entitlements, something that you could try to kind of manipulate those tools to handle, but there weren't out-of-the-box integrations with the downstream business systems, notably the billing systems or the CRMs or the ERPs. And then we saw maybe kind of a long tail of other ways that people would handle it. Sometimes people would handle it with JWT tokens trying to do it through their auth provider. Sometimes people would handle it just in their database. But I think the most common was hardcoding early on because it was so easy. And to do anything different would be to future proof, and why do that? But it ends up being quite problematic for companies if they achieve scale and really want to start evolving pricing and packaging. And when Ben and I started Schematic, our observation was that if you reimagine feature flags to go beyond deployment and beyond rollout and into monetization and metering and packaging, that's actually a really valuable use case for the primitive because then the business can effectively achieve a ton of pricing and packaging agility and remove developers from having to support pricing and billing initiatives, which they often don't like to do. But it also allows you to be really flexible with the individual customer, handle overrides and handle exceptions and deal with customers, again, without having to bring developers into the play to do that. That could be handled by somebody in success or BizOps or product. So that's some of what we learned over the last two years exploring this from how people approach this perspective. 

RD That's interesting that you actually give the management of the flags to non-developers, is that right? 

FG I think part of what we envision here is that core CI/CD is owned by developers, and there are abstractions on top of flags, and entitlement is an abstraction on top of flag that has a real commercial connotation, because it's effectively what should you have in the product based on your contract or your subscription. That entitlement should be manageable by somebody who owns monetization or pricing and packaging, and that often I think more and more frequently lives in product. 

BP I would just add on that feature flags are such a familiar interface that piggybacking on that is powerful, because then, like Fynn was explaining, it potentially alleviates future dev work if you can set up your monetization using a system that uses such a primitive. But the way we think about it at Schematic is that we want to give appropriate interfaces to every kind of persona in this story. So to the developer, it's report usage using maybe a segment API because folks are very familiar with events, and then check access using a flag because that's a very familiar interface, and so forth. But from the perspective of a business user, they should have an appropriate role within the administrative web app that controls the system to where they can manage the plans, but maybe their role isn't necessarily relevant to maybe lower level targeting that pertains more to CI/CD. And something we kept hearing is that pure play feature management tools, you could really do entitlements with them and people sometimes do, but it is considered kind of a hack in a way, partially just because the interfaces aren't ideal for those business users. The targeting interface is usually some type of Boolean logic form. It was very consistent in our customer research that the agility you'd want to get out of using flags for entitlements wasn't obtained in these contexts because the user type of a customer success or product or whatever just wasn't as used to that type of interface. So you'd want something where it's more like ‘manage plan’ rather than ‘manage this Boolean formula of feature targeting.’ So sorry if that was long-winded, but that's a big part of our vision with Schematic– it's all of these appropriate interfaces for different user types. 

RD I think that makes sense. You give the nontechnical folks something that they understand, and you almost bundle all these feature flags in the end. You don't have them turn them on and off individually. 

BP And a lot of times they feel strongly about their tools. If you've ever tried to get a sales team to stop using Salesforce, you have a story of failure probably, or with Stripe, billing integration, whatever it might be. Maybe they're not as attached to the tool as sales teams tend to be to their CRM of choice, but they're certainly going to be pretty finicky about integrating in the exact way they want because the money needs to be right or whatever, if you're touching that. Those tools themselves might be the appropriate interface, maybe they can manage things from within the CRM like closing this deal just by virtue of the company being closed into a certain plan in Salesforce. They get provisioned all correctly through the FM tool, that's kind of a nice way to picture it to make it, again, appropriate, convenient, however you want to say it, for that user in the set of users. 

RD So do you ultimately aim to automate it through these other interfaces, just have the click of a button in Salesforce flip a bunch of flags? 

BP That's definitely a way it can work, yes. And like I said, I think my ultimate goal, or our ultimate goal, is to have basically this work well and in a way that each user type thinks is correct for them. If they don't want to go into some other tool to manage their plan that some is customer on, they shouldn't have to do that, especially if probably they have to do that in Salesforce already. So we don't want to make people just do two things when they could do one. We do want the tool to be powerful enough that, especially for an early stage company that hasn't expanded out in all these different teams that have their own SaaS contracts and things, you could just start right with Schematic and grow from there and that becomes kind of your mission control of everything. So we kind of want it to be possible to approach in either direction. 

FG To me what's so interesting about this conversation– we started this conversation all about feature flags and why they're hard, what are patterns, what's the kind of original theory, what's the real reality? We're kind of moving into their extension into pricing and packaging and maybe even our vision a little bit, but I think what was so fascinating to us about the discovery around this product that we're building in the overall space was that feature flagging could do more for businesses. It's done an immense amount for project and engineering teams and especially developers and especially DevOps, but it could do a lot for businesses. And you could sort of hear that pent up vision for what they could do from a lot of people that had worked on great feature flagging products. They saw the power, but still the market just hadn't come to recognize, whether it be from inertia or not enough tooling, that that could exist and you could build a world in which the feature flag runs CI/CD, but then the business can achieve all this flexibility and it's reconciling between application code and downstream business systems so you're not dealing with huge homegrown systems that developers are constantly building and maintaining to support the way the business evolves. That was, I think to us, the thing that was so interesting. We're focused on startups, but we were interviewing growth stage $20 million, $30 million businesses that were spending millions of dollars in engineering resources every year to just try to manage monetization, and none of it was core functionality. 

RD It's interesting that you said earlier that you're basically piggybacking on this known feature flagging process. Do you think there are additional features you can piggyback on with the feature flagging? Is there more that they can do? 

FG Well, I'll speak from my vantage and then, Ben, you layer on. I think there are a couple of things that we have heard a lot of pull around, and the first is that if you are kind of power in an entitlement system, you have some really interesting context about a business. You have the usage data at granular feature level combined with the financial data of the customer, or even the cohort of customers on the plan, and that presents really interesting opportunities for obviously business analysis and business intelligence and there's a lot you can do there. I think the second thing is that I think companies, especially companies that pick dev tools, are under an enormous amount of pressure to deliver these very consumer grade purchasing experiences to the end customer. I've run growth at a business. Everywhere you go, every time you hear the word growth, it's all about reducing friction for the end buyer. That kind of necessitates that you go build a lot of different purchasing experiences, from upgrade paths to downgrade paths to usage meters to customer portals. All of that is effectively components that could be easily dropped and embedded into an application, but it sort of relies on this underlying entitlements infrastructure so that it's serving up the right information to the right customer at the right time. And so those are maybe two examples of things that we think you could do on top of this foundation. Ben, what would you say? 

BP That second one about UI components, I think, is a big not obvious one, just because doing actual UI is obviously pretty far afield of feature flags themselves, but it was kind of a light bulb moment for us. The original idea was very much in the FM space, but then it was a light bulb moment for us once we realized, “Well, we have all the usage data about your features. We know what all your plans are and how much you charge for everything, and we know who has access to what.” So we kept hearing in customer interviews that a common pain point from CTOs was that maybe I have this page on my app that's my plan page and it sucks to work on. Basically, we can't redo it because it'll break everything, and no one wants to work on it because it sucks to work on, and half of my customers when they go to that page, they don't even see the right thing because they're on some legacy plan and we don't have that condition handled. I felt like that was something that just nobody wanted to deal with and we were just well-positioned to swallow up all that complexity and just give you a drop-in thing, because, again, it's just kind of outsourcing a bunch of conditional logic to another service. So it seems like it doesn't really fit into the feature management world, but in our conception of it, it kind of was a very intuitive lateral move to grow into that. So that's a big one. Analytics-type use cases that have been mentioned, people ask us for that. I tend to kind of push back until I feel like we've really nailed some of the more core stuff just because I feel like there's a lot of really good analytics products out there, but we'll probably get there at some point. There's a lot to do within plan management. We want to be really exceptional at, “Okay, I want to try out some new pricing. Let's just try out maybe this packaging only on some cohort of customers.” That's the kind of thing that's really, really hard and annoying to do and might have no results, so those ideas tend to live and die in the Google Doc and the meeting, and then by the time it comes down to, “Let's implement this,” everyone kind of realizes we're not sure what the results are going to be yet and the cost is going to be high so we're just not going to do this experiment. We want to unlock more of that. Fynn was mentioning feature flags getting all that adoption and being a DevOps trend. I've been a big believer in basically the original DevOps vision and DevOps manifesto and all that stuff, and a big part of it that feature flags are a part of is the ability to do continuous delivery I feel like has been a big boon to just software quality overall because you can run your tests, you can really isolate when something went wrong with just this one release, just this one change. And I feel like there's similar operational things we can offer to businesses, like, “Hey, if you want to just tweak this plan by this tiny amount and see what result that has on your business, you can do that now and it has borderline no cost to you. Or if you want to make this feature an add-on that you can buy à la carte rather than something included, just change that. You can do that experiment and find out what impact it has on your business. Those are things that are really part of the core value prop, but I just think are going to have so much impact if people start adopting them that I'd really like to see that play out. 

RD I can see why people are asking for analytics. If you're running all these little experimental packages, did that work? 

FG Again, we started our company 18 months ago. We're an early stage company. We're really excited about the companies that are adopting the product because, to us, they seem like really good companies who are kind of buying into this vision that flags can go do way more. But when I kind of reflect a little bit back on the research before we even started the company, I think there were three things that we kind of consistently heard, and two of them were from engineering and one of them was from the business at large, and it was interesting to compare the overlap. From engineering, we often heard the root cause of the complexity is entitlements. That is a really hard problem. And then the second thing we would often hear is, especially from startups, “I hired Stripe to own billing and payments, but then as I grew, I had to build all this additional infrastructure around Stripe or between Stripe and my application and I don't want my team to have to be doing that. I don't want my engineers to have to be on billing initiatives every six months.” So those are the two things we heard from engineering, and then on the business side, we would hear statements like, to Ben's point, “We've stopped talking about pricing, because it would require pulling a core team into a sprint to do this and we just have to go ship new features.” And the more we heard that, the more we started to say to ourselves, “Well, it's kind of crazy that a really fast-moving business that's shipping all this new product and investing in all this new product can’t actually monetize that product.” The product roadmap is so disconnected from the pricing and packaging roadmap in most companies, and that's a real missed opportunity for companies, especially those where so much of their core expense as a business is going into application development.

[music plays]

RD Well, thank you very much, folks. It's that time of the show where we shout out somebody who came on Stack Overflow and asked a good question, dropped some knowledge, helped out other developers. Today, we're shouting out Remy Lebeau for winning a Lifeboat Badge. He gave an answer that reached a score of 20 or more on his question that had a score of -3 or less. The question was: “How to fix 'std::logic_error' what(): basic_string::_M_construct null not valid error?” If that makes sense to you, go check out the question. It made sense to almost 600,000 other people. I'm Ryan Donovan, I edit the blog here at Stack Overflow. You can find the blog at stackoverflow.blog. And if you want to reach out to me with comments, questions, topics to cover, you can find me on LinkedIn. 

BP I'm Ben Papillon, I'm the CTO at Schematic. If you want to find out more about Schematic, go to schematichq.com. If you're looking for more about me, I can be found on Bluesky and X. 

FG I'm Fynn Glover, I'm the co-founder and CEO of Schematic. I've worked with Ben for eight years, and if this has been interesting to you, I'd encourage you to read his essay on feature flags theory versus reality. It's rife with Simpsons images, which makes it all the more enjoyable. And then you can find me on LinkedIn or Twitter/X.

RD And we'll include the article in the show notes. Thank you, everybody, and we'll talk to you next time.

[outro music plays]