The Stack Overflow Podcast

How to build open source apps in a highly regulated industry

Episode Summary

Today we chat with Reshma Khilnani, co-founder and CEO of Medplum, an open-source platform enabling companies to build healthcare applications like EHRs and patient portals. She discusses how to iterate rapidly in an industry where SOC2 compliance is just the beginning (one of the compliance tests is named after Dante’s epic poem depicting the nine circles of hell, if that gives you an idea).

Episode Notes

Before Medplum, Reshma founded and exited two startups in the healthcare space – MedXT (managing medical images online acquired by Box) and Droplet (at-home diagnostics company acquired by Ro). Reshma has a B.S. in computer science and a Masters of Engineering from MIT.

You can learn more about Medplum here and check out their Github, which has over 1,200 stars, here.

You can learn more about Khilnani on her website, GitHub, and on LinkedIn.

Congrats to Stack Overflow user Kvam for earning a Lifeboat Badge with an answer to the question: 

What is the advantage of using a Bitarray when you can store your bool values in a bool[]?

Episode Transcription

[intro music plays]

Ryan Donovan Hello, everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm Ryan Donovan, I edit the blog here at Stack Overflow, and I have a great guest for you all today. We’re joined by Reshma Khilnani, who is the CEO of Medplum, does open source application to enable healthcare companies to build applications. We're going to be talking today about how to move fast without breaking things, as it is a highly regulated industry, and how you can build the compliance directly into the build process. So Reshma, hi, welcome to the show. 

Reshma Khilnani Ryan, thanks so much for having me. I'm happy to be here. 

RD Of course. So at the top of the show, we like to ask our guests how you got into software and technology. What's your origin story? 

RK Ooh, the origin story. It goes back a really long way. I'm born in Palo Alto on Stanford campus, so I think it's kind of got a lot of tech culture and I was interested in programming from a young age and really like to build web apps and that type of thing. So I'd say the joy of building was my initial introduction to it, and I still love it today. 

RD Nice. So talk about what it's like creating software in a regulated industry. What sort of constraints do you have on you that other software companies don't have? 

RK So Medplum, we make an open source electronic health record development platform. So if you're making a custom health record, like if you want to make just for pediatric special needs kids, for example, and you want to make an app just for them to record their records and manage their care, we are for that. And the regulated industry, there's so many regulations and programs that govern these healthcare applications so you need to support certain kinds of interfaces mandated by the government, you need to support data privacy, of course, and HIPAA, those type of regulations. What's interesting from the perspective of the developers is that all of these kinds of programs have a specific technical implementation as well. So the Center for Medicare and Medicaid mandates that you have to support a certain sub-kind of OAuth 2.0, it's called SMART on FHIR, and you have to have authentication with specific scopes and specific data model in order to pass their certification to be a health record in America. So from the developer perspective, you can't just make up your data model, often. You can't just make up your identity management system, your change management system. You inherit those from the regulations. And sometimes the people who are writing the regulations, they may not have that high fidelity view into how to develop the application, so there's a layer of interpretation as well how to map what's written in the regs to the regulated environment and interpret it correctly and be able to stand up to the audit that you're going to have to face. 

RD So the SMART on FHIR, it's a specification like OAuth 2.0, right? 

RK It's OAuth 2.0 plus a FHIR data model –FHIR is a medical schema– and then it has scopes. So if you're a patient, you get a certain scope with these certain permissions, and if you're a practitioner, a doctor, you get a different set. 

RD I've definitely worked with OAuth 2.0 and looked at the specifications. Is this SMART on FHIR as complicated or does it sort of trim it down to be a little more specific and get rid of some of the extra cruft that's in there in OAuth 2.0? 

RK It gets rid of a little bit of the cruft but maintains most of it and then adds on the extra, the special scopes, the data model. And they call it smart app launch, so it's special hyperlinks that support launching with the right context and those are built into the spec. Healthcare is interesting in that the Center for Medicare and Medicaid publishes a test harness called Inferno, named after the seven circles of hell. 

RD That good, huh? 

RK That good. They named it. But Inferno is a test you can run your app through to ensure that it's going to pass all of the scenarios that they want. It has a lot of the OAuth 2.0 testing etc. in it. 

RD So it sounds like so much is sort of specified, so much is limiting for you. How do you keep that sort of pace of development going? How do you go fast and not violate HIPAA or whatever else is out there? 

RK So I was on the Facebook, now Meta, team more than 10 years ago. And I think in the regulated industry, you can still move fast, but you’ve got to use different techniques. Test-driven development is your best friend in the regulated industry. What we end up doing is we have a very rigorous test-driven development framework. We're actually an open source company, so if anyone is skeptical about what they're hearing here today, please go to github.com/medplum/medplum and you can see a reference implementation of how exactly it works. But say you need to, for example, support these OAuth 2.0 specific implementations. Having a test for that, and the test is tagged with what regulation it is supporting, and your documentation in source control is also tagged with that specific regulation, that type of change management driven by your SDLC is extremely useful in this regulated context, because they're like, “When was it released? When did it pass the tests? When the new version of the regs came out, when did you incorporate it? When did you document that you did it?” So all of those questions are answerable by source control, and source control and SDLC has been an invaluable tool in allowing us to move fast and ensuring that we're not breaking things. One of the issues with these regulated industries is also that you have a big surface area of functionality you have to support but it's not always used that frequently. It's required for regulation, but it may not be frequently exercised. So that test-driven development to make sure there's no regressions is also super helpful in that context.

RD Do you have any leeway on those tests or is that part of the Inferno testing suite? Is that like you have these tests and these are the things it has to pass all the time? 

RK You don't have much leeway– you have to pass. The way that the ONC certification works is that they don't monitor your build system, that's a rare thing, but they have an examination that you have to pass and you go into a lab and you demonstrate that your application passes. And so we run it all the time and we make sure things aren't regressing, but the real audit and the test is more annually. 

RD Gotcha. So like you said, this is open source software. Does that add complications to the regulatory compliance framework? 

RK It can add confusion. I think regulators often don't really think in an open source way, and so the idea that code is different from product and those nuances are sometimes hard to navigate. I'd say it's very helpful in many respects in the regulated environment, mostly from the community perspective. We have our kind of open source product and our open source community, and we've seen people who work on the test harness come and join and they're like, “Oh, great. There's an open source implementation, so I can test my test harness.” And we kind of have this community flow of information and implementation, so in that way, it's very beneficial. But there is a learning curve to many in industry. Sometimes people feel like open source couldn't be HIPAA compliant because something might be exposed. Us as devs realize that that's not really the way it works, but we do come across that from time to time. 

RD I've heard in some ways open source is more secure in that you have other people who aren't involved in the company looking at the code, testing it, figuring out what it does.

RK That is my belief, too. 

RD So for other people to build, do you have to deal with infrastructure issues? Do you have to worry about where this code is hosted, where the things are hosted, or is that somebody else's problem? 

RK We have our hosted environment, and in the regulated industry, some types of compliance follow the code and some types of compliance follow the environment. And so we have to have a really nuanced rubric to know which functionality falls in which category. For example, in HIPAA compliance, there's a lot of tools like Vanta or Secureframe where you can kind of scan your environment to make sure that it's up to spec for HIPAA. And what we provide as a product is our code. We use infrastructure as code, CDK, all that stuff, to ensure that when you install the application, it's going to pass those scanners. So we have a delivery mechanism like that. 

RD So like you said, test-driven development must have a complicated CI/CD process. How do you maintain a good CI/CD process and also have it be faster?

RK I think good CI/CD and making sure that you have a continuous release so your system doesn't calcify is one of the tools that we keep to move fast. I think that we keep a close eye on all the updates to GitHub actions and Turborepo and those tools. We're very well versed in them and the team kind of looks closely at how to get the workers to run faster, more parallelized, and we invest as a core competency in the CI/CD for this purpose. The real benefit of the CI/CD we see from a cultural perspective is that we want to have it so robust that we feel confident that when we make changes we're not going to hit a wall or be at high risk of regressions. We don't want to all have our fists clenched when we release, we want to be just so comfortable to roll out because we've done that investment. And so that test-driven development, unit tests, integration tests, CI/CD, that does a lot of stuff, code scanners, these are all tools that are helping us achieve the velocity that we want and it's always a work in progress. 

RD I'm curious if there's either code patterns or software engineering patterns that you found that are by nature noncompliant, that are risky on the face of them. 

RK There's always things that to me smell a little bit and I'm concerned about it from a regulatory perspective. Unauthenticated access, even to features that you think might not be exposing any data, always look at those with a fine-tooth comb. Anytime you're introducing things like cookies or client side instrumentation, you always need to look at that carefully. Client is generally untrusted. The identity systems and how they actually ensure identity, what checks they do and their standard, that's a big one. And any type of impersonation– “Oh, you can run as this user. You can shadow someone's account,” laser eyes at those, across many regulatory frameworks. We support many SOC 2 Type 2, HIPAA, and ONC, CFR, they all have elements that can trace back to those patterns. 

RD I've definitely seen some skepticism about admin routes in general. And I'm also curious– you mentioned cookies. Is there any way that cookies or any sort of client-side saving can leak HIPAA information? 

RK If you use it wrong, yes. The healthcare regulators are actually pretty sophisticated about this now, and there's been judgments of folks who are passing health data to advertising networks, for example, and have received fines. So absolutely, it's not even theoretical, it has been an issue. But the mastery of your data flow, what data is stored where and upon, that's key to having a great high velocity and compliant application. 

RD Obviously data is so important for any regulated industry, and the software engineers who are building those data platforms, creating the data pipelines, I would assume they have to have some pretty specialized skills or at least be aware of it. What do those engineers need to know and be able to do to work in a regulated industry? 

RK Certainly in the regulated industries, cross-functional skills are just very highly valued. If you're building an app and you need to certify it or go through audit, you're going to be working with legal compliance, often finance, you may need to do vendor assessments, so work across the organizational boundary. And so engineers with those skillsets and the ability to break apart problems from regulations to English, to code, to process, that is extremely valuable. And I don't think that there's specific training along these lines, but you definitely see people develop those skills across their career. And it happens at all levels. We see from our customer base, folks who are very skilled at it on the other end. They're doing technical diligence on us about whether we would pass their compliance regulations. And you can see by their questioning their cross-functional skill and their communication skill on these fronts. 

RD Right. It's compliance all the way down. Everybody using your product has to be compliant too. 

RK That's right. And they need to be sure that they could not hit a roadblock at some point in the future due to some technical limitation.

RD Does the regulated industry change how you have to do logging and observability? 

RK Logging is very important in almost all of the regulated industries and at a few layers. First you have to have logs, they have to be linked to identities and you have to log different parts of your maintenance– for example, key rotation, things like that. And then you need to ensure that your logs can't be tampered with, and a lot of regulations have this so there's access control on the logs and changes to the logs and configuration for the logs are also monitored. The techniques that we find really useful for this are the infrastructure as code and set up your roles, your logs, your bucket logging and all exactly the right way so that you do it the same every time and you know that you have a good configuration on that front. 

RD Is there a benefit to just finding the way that works best and just doing that all the time and not trying to try something new?

RK Certainly in healthcare, you can expect change. There will be a new version of the regs every year, so there is best practices, but you've got to kind of have the flexibility. For example, CMIA, California has specific health laws and they're like, “This data that was previously not health records is now health records. You need to log in and treat it like that.” So you’ve got to be able to flex into that zone, and having a great change history of what you've done in the past and why is going to help you there. And we use our documentation, we put it in source control and we look through the history of it for this purpose frequently.

RD It's interesting that I've seen a few other startups in the medical industry starting now, and I think HIPAA felt like a limitation before and all these things felt like a limitation before, but it sounds like a lot of these regulations are giving folks the shape of how to make this software, right?

RK Yes. 

RD And how to do it ethically. 

RK Totally. How to do it ethically, and also people focus a lot on HIPAA having the data privacy, and that's a pillar of it, but there's also that you have to have the request be available to patients when they ask for it, and there's that side of it, too. So I think as more folks enter the industry, which I'm super excited about, the increasing kind of literacy and tooling to support is great. 

RD Is there any tooling that's come out in the last few years that you think has made it easier for you and other folks in regulated industries? 

RK Certainly the compliance tools like Vanta and Secureframe are really great in this regard. I am a big fan of things like CDK and those have also made a big difference. We use Docusaurus a lot and I think it's an underrated tool for this use case. People write docs or make collaterals and if they're not versioned or linked to specific commits, it's also just hard to reconcile what's happened, so I like those types of tools and frameworks. And then certainly all of the advancements in GitHub actions, CI/CD, those make a big difference too, so love those. 

RD Being able to build all this in automatically saves some time.

RK I'll shout out to the regulators, too. I do think that they have become more sophisticated over time. The original HIPAA regs from the ‘90s are not as technically informed and the CMIA and updates that have rolled out more recently, they're like, “Well, there's an open source test harness,” or something like that, which are an innovation. 

RD It is nice. As somebody who has gone to doctors before, it seems like it's been all in manila envelopes for a long time. How do you think the increasing digitization of health care will change healthcare in the future?

RK We're still early days in the digitization of healthcare. Now a lot of the data is in digital form, but there's a lot of unstructured data so it's kind of hard to wrangle in many respects. And in my experience spelunking in a lot of medical systems and touching their underbelly, the data quality is often mixed. I think that AI and these tools that can handle a lot more modalities of that kind– unstructured, poor data quality, duplicates, that type of thing– there's a bright future for it. Lots of medical data is generated and never used again and I think that's a missed opportunity as well. So I think it could have a high impact to even bring the health care data closer to maybe what we see in fintech or something like that. At least it's used, it's put in a ledger that's balanced. Things that will increase the quality of care and make lives of practitioners easier 

RD Is there also a risk that the day comes when a specialist can access my old profile with an API? Is there a danger to that or do you think we'll be pretty careful about it?

RK Governance is a big issue, and I would say that it's unsolved. At present, there's these national networks for data exchange: CommonWell, Carequality, and then regional ones as well. And the governance is mostly if you have a treatment purpose to treat that patient, you can make the request. The identity matching is not robust, so I'd say it's a work in progress, though a lot of thought has gone into it. I was mentioning back in the beginning with the SMART on FHIR scopes– in a future world, let's stretch our minds to patients having a good idea of their data model of their data, and they could grant access to certain subsets. You can see my lab results, but my therapist notes– no. Something like that I think could be in the future and we would have tooling to have that be nuanced, but I'd say it's a work in progress today. And certainly data washing around without governance, I can't advocate for that as practice.

RD I heard a story recently of a hacker going into Finland and leaking basically all of their therapy records. And that seems like a nightmare, but I do look forward to a future where all my records are hosted in a secure cloud provider that I control. 

RK And your demographic and allergies, maybe that being easily accessible has a lot of value.

RD Is there anything we haven't touched on that you want to get to? 

RK I would love to think about from your perspective, are there community wide initiatives towards the regulated industry that you see sitting at Stack Overflow? 

RD I mean, like I said, we've talked to a couple of folks with medical startups. Obviously, I think that the PCI area, the transaction credit card area, they've done a good job of self-regulating, but I think the medical industry still feels new and it still feels a little squickier.

RK I did work in payments before working in medical, and one thing that's hard and different is that if a transaction goes wrong, you can be made whole by the payment, but things like privacy or changes to your insurance and things like that are harder to rectify and that's a big issue. I'm glad to hear that there's some interest in it. I do think that the arc mirrors FinTech. What has happened in FinTech in the past 10 years, healthcare will have a similar transformation. 

RD I agree. And I think any industry where getting things done and getting what you want is kind of a pain is ripe for innovation.

RK Indeed.

[music plays]

RD All right, everybody. It's that time of the show again, we'd like to shout out somebody who came on to Stack Overflow, saved a question from the dustbin of history, added a little knowledge. Shout out Kvam for answering the question, “Bitarray vs. bool[]” Not really a question, but if you've wondered what the difference between a bitarray and a bool is, we have an answer. I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. If you like the show today, please leave a rating and review. And if you want to reach out to me, you can find me on X @RThorDonovan. 

RK I'm Reshma Khilnani, co-founder and CEO of Medplum, and glad to be here on the show. You can find me at www.medplum.com or github.com/medplum is our open source repositories. 

RD All right, everybody. Thank you very much, and we'll see you next time.

[outro music plays]