The Stack Overflow Podcast

Battling ticket bots and untangling taxes at the frontiers of e-commerce

Episode Summary

On today's episode we chat with Ilya Grigorik, a Distinguished Engineer and Technical Advisor to the CEO at Shopify. From battling hordes of bots trying to scalp seats before humans can get their hands on concert tickets to automatically handling relevant tax codes and regulations across countries and states so small merchants can focus on their business, Ilya shares some of the projects he enjoys most and the challenges that make e-commerce interesting for software developers.

Episode Notes

You can find Ilya on LinkedIn here.

You can listen to Ilya talk about Commerce Components here, a system he describes as a "modern way to approach your commerce architecture without reducing it to a (false) binary choice between microservices and monoliths."

As Ilya notes, “there are a lot of interesting implications for runtime and how we're solving it at Shopify. There is a direct bridge there to a performance conversation as well: moving untrusted scripts off the main thread, sandboxing UI extensions, and more.” 

No badge winner today. Instead, user Kaizen has a question about Shopify that still needs an answer. Maybe you can help! 

How to Activate Shopify Web Pixel Extension on Production Store?

Episode Transcription

[intro music plays]

Ben Popper Take charge of your cloud workloads. Control your performance, stack, and cost with Equinix Dedicated Cloud. Sign up at deploy.equinix.com with code CLOUDCONTROL, all caps, for a $300 credit. Get your control freak on with Equinix. 

Ryan Donovan Welcome to the Stack Overflow Podcast, a place to talk all things software and technology. My name is Ryan Donovan, I'm the Editor of the blog here at Stack Overflow, and I'm joined by the illustrious Cassidy Williams. 

Cassidy Williams Ooh, illustrious, hey.

RD So much luster. How are you doing today, Cassidy? 

CW I am good. I am very excited to talk to our guest today. 

RD Yes, our guest. We're going to be talking about the complexities of checkout with our guest, distinguished engineer at Shopify, Ilya Grigorik. How are you doing today, Ilya? 

Ilya Grigorik I'm doing great. Thank you for having me on the show.

RD Of course, of course, our pleasure. So before we get into the mysteries and labyrinthine complexity of checkout, tell us a little bit about how you got into software and technology. 

IG Oh boy. Well, I guess I started building websites way back in the late-90’s. I built my first amazingly embarrassing Angelfire website, and then it was all downhill from there.

CW Wow. 

IG And a fun anecdote from that– I remember uploading bitmap files because I didn't know any wiser. I would use Microsoft Paint and create my beautiful pill icons and build a frames navigation and I was wondering why it was taking forever to download. And I think ever since, I've been paying down those sins. I've spent about a decade working on web performance, working in the standards community and all the rest. So I think it's all just starting right from there. 

RD That's your penance, right? 

IG Yeah, forever. 

RD And you went from the humble beginnings to a distinguished engineer. How long have you been at Shopify? 

IG I've been at Shopify for about three years, which frankly feels like a decade and that's a good thing. It's an amazing playground. 

CW It's also been a particularly weird three years in the world, so that makes sense because time is weird. 

IG Yes, absolutely. The last three years have been a rocketship for commerce. 

CW Oh, for sure. 

IG And being at Shopify has exposed me to a lot of the interesting complexity of the space. I've done a lot of due diligence, obviously, before joining Shopify, and I appreciated that commerce is not a simple domain, but I am still to this day amazed every day about all of the intricacies and the depth of the space. 

RD Let's get into that complexity. My previous job was at Grubhub, which obviously rests a lot on checkout and carting, and I was amazed at how much complexity there is on the checkout process for something that you control all the transactions. So what is it like having the checkout process as a service? 

IG So it's really interesting because on the surface, it's just a web form. It's a bunch of fields, so how hard could it be? And that's a little bit confusing. Hey, you have a paintbrush and a set of paint. Well, now you’ve got to paint a Rembrandt. The tools are not the process and not the outcome. So the really hard part here is exactly as you pointed out– it's not really about the forms, it's about the complexity of the underlying processes. So I've spent about a year now really digging into our checkout platform working with the teams, and it is really kind of a crucible where all the many product and platform features have to come together and work together in a coherent way. And as a little bit of context of what makes this problem particularly complicated, at Shopify we support millions of merchants. Just recently we crossed 1 trillion in cumulative GMV that has gone through our platform, and that's with a T– a trillion. 

CW That's wild. 

IG It’s a pretty massive number. And Shopify today powers about 11% of e-commerce GMV in the United States, so that's just kind of the scale of the operation. 

RD And for listeners, what is GMV? 

IG It's the Gross Merchandise Value. So think of the contents of your cart. So if you bought $100 worth of goods, now you add that number cumulatively, and over the history of Shopify, we recently crossed 1 trillion. And I suspect that it'll be 2 trillion at some point in the future, which will be much shorter than it took us to get to this point. So first it's just the scale. Second is the complexity. When you support millions of merchants, you're working with an infinite amount of business requirements, because while many businesses are the same, and yes, they sell online, but we also support offline. We have our POS and other products, so you can go in store. You have to think about all the different variations of discounting strategies, the gifting programs, the loyalty programs, the delivery capabilities. Maybe you want to offer pickup in store, or you maybe have a pickup point that’s supported, or you want to run a promotion, or you want to bundle products. You also have to do taxes, you have to do all these things. And by the way, all of those things are expected to just work magically and work together. So whenever you add one other thing like, “Oh, let's just add pickup points,” that actually propagates through all of the rest of the business processes and how we present it and it just needs to work. And the last one I'll highlight here is the scale at which we operate. Shopify is also home to a lot of the cult brands and icons. So a very common occurrence is flash sales, which is when some event happens, your favorite pop star drops a social message saying, “Hey, I have a release of X,” and all of a sudden you have millions of users descending on your platform that all want to check out at once. And in addition to all of those actual humans, you also have bots that are attacking the system, trying to figure out how I secure this inventory so I can resell it. So it becomes this really interesting challenge that you have this massive spike in requests. How do you handle that? How do you do fair queuing? How do you distinguish bots? How do you do all of those things? And of course, everything just needs to work. And it's really hard because historically these sorts of things would be like Black Friday where commerce platforms significantly scale up their infrastructure to accommodate this sort of growth, but this stuff just happens on the internet today because your social star just does a selfie with a product and all of a sudden it's like, “Oh my God, I need to sell a million of those widgets.”

CW It's kind of like how Amazon basically invented their own holiday of Prime Day, and suddenly that's when a ton of commerce happens. And then you see a bunch of other merchants being just like, “We'll do Prime Day too.” And it's made up, but then the sea of people come anyway. 

IG Yep, exactly. So it's the confluence of all of those things– the millions of merchants, the business complexity, and the operational complexity that, first, you have to support that many people, but also be able to scale up and scale down and all the rest, which makes it a really interesting engineering SRE challenge and product challenge.

RD When we talked before this, you had an interesting three C's of checkout. Can you talk a little bit about what those three factors are and how do they affect what you do every day? 

IG So for checkout internally, we talk about conversion, composability, and compliance. Those are the promises. So for folks that are not familiar with Shopify, we provide a number of services. One of them is a hosted checkout, and hosted checkout is a managed runtime and a managed service where you can bring your own customizations, whether that's branding requirements or some functional requirements, but we provide the actual runtime and we provide certain guarantees about it. So we can just dig into each one of these. Conversion is about speed and capabilities. And when we talk about capabilities, it's some of the things we already mentioned. It's helping folks zoom through the checkout experience. That may be surfacing relevant payment methods or delivery methods, providing the right address inputs and validations, which by the way, differ pretty dramatically across different geographies. You have localization, things like translations, and then compliance opt-ins and opt-outs. And of course, performance. Milliseconds and bytes absolutely matter in the space, so how do we deliver an experience that is objectively fast, but also feels fast? Because some of it is just how do the components render, when do they show up, are you overwhelmed with the amount of information or are you able to progress kind of quickly through it? The next one is composability. A lot of the merchants that start on Shopify are not technical. It's mom and pop shops, somebody who has a kick ass barbecue sauce. They know nothing about selling or commerce, but they want to try their hand at entrepreneurship. And Shopify tries to meet them at exactly that level and lower the bar to getting started, which is a very complex challenge. This goes far beyond just check out. It's about how do you get started, how do you open a bank account, how do you get a loan, how do you accept payments, how do you do all of those things? But for checkout specifically, the composability comes also from that we provide this managed runtime, but then we have our partner ecosystem that develops apps and extensions such that you don't have to rebuild everything on your own. So you can go and install an app to provide, let's say, a better gifting experience. 

RD I think I actually used the Shopify extension for WordPress for my wedding registry, so thank you for that. 

IG So that's a great example. You did not have to build your own or go bespoke or hire an engineer. And by the way, if you have something bespoke, you can hire an engineer and build a custom app and then provide the functionality. But it's that ability to reach in and to, “Hey, there's probably some other set of merchants that already had a similar need that can install this app,” but then you can also combine it with other applications. So you can pick or snipe three or five, configure it to your business, and that'll probably take you pretty far. At a certain scale at a certain point, you might end up with an engineering team or enough budget to hire somebody to build a very bespoke thing that mirrors your particular needs because you're selling cookies in New York City and you have a very particular way of doing fulfillment and you just need a custom logic, that's totally cool, you can do that. But we're trying to make this process simple, upgrade-safe such that we can actually keep the platform evolving and shipping new features without the apps interfering with that, and also easy to maintain. So the name of the game in all of this is the cost of ownership. It's one thing that you set something up, but then do you also have to keep a consulting industry on retainer just to keep the lights on? And that's where we're trying to lower that bar and assist our merchants with that if you're on our platform, you’ve installed these apps, there is an upgrade path that we can guarantee. The apps can also version and upgrade themselves safely and all the rest, which requires that we set up a pretty well-structured API interface and contract between each of these places. And the converse of that would be actually something that we've offered before but are now sunsetting, which is effectively raw access to HTML and JavaScript on the checkout page. It feels very powerful and it's really great because keys to the kingdom, go crazy. But then guess what happens? Developers start targeting specific DOM elements and selectors, and they start assuming certain things. And the moment you do that, we can't safely upgrade you to the newest and latest version of some new set of capabilities. So a big shift for us has been to get onto this managed runtime and establish all of these protocols and APIs such that we can provide all of this upgrade safely. And then the last one I'll mention, the last C is compliance, and boy, it is a jungle out there. It's a regulatory bazaar. Every country has its own developing set of differences and some of them are simple and as funny as that some require that a checkbox appears above or below a certain element, often that matters. So imagine that if you're building a checkout yourself and you need to account for that, that's really painful. But as a managed runtime, we can actually just bake in that by default and say, “Hey, you're accessing checkout from Sweden. The regulation says that the checkbox must be visible above the fold or below the fold, so we can just put that there, but maybe we don't do that everywhere else.” So we have to adjust and adapt for each one of those. Those would be examples of regional compliance. There's of course things like accessibility. Europe has its own set of requirements with the European Accessibility Act, US has ADA, and then of course security. And PCI is something that we've spent quite a lot of time recently thinking about, because PCI v4 is coming into effect next year and it's kind of a big deal for anybody that does commerce. 

RD Well, I think PCI in general is the big one just because it's the payment methods industry. I had a question about the conversion piece. I think one of the things I saw when I was working at Grubhub is that you have to account for failures in the back end, but you can't charge the person twice. How do you solve that conundrum? 

IG So it depends on where the failure happens. So there's payment gateways, and then there's failures somewhere in your own backend and processing. So the way we set up our system is that we try to decouple those things by default. And sometimes you have to reject and stop the user, and sometimes you can remediate it after the fact. So typically when you complete a checkout, we will do an authorization, but we may delay the actual job for concreting and finalizing the payment after the fact. In fact, some merchants, I don't know if you've ever run into this, you'll press pay, it'll give you a confirmation and then a day later, they may email you saying, “Hey, we've actually tried charging the card and it's not going through.” So it's an example of this delayed processing because there's performance reasons for that, there's also ability to batch and do other things. So there's a variety of ways to handle this. 

CW All of the different rules and regulations across countries is one that always blows my mind, even just when it comes to invoicing. Different countries have different rules around, “Okay, this is what a receipt should have,” or, “This is what a person should be able to do to buy something.” And navigating all of that sounds like just such a big problem to solve where there's so many different ways that you can solve it and none is probably perfectly optimal. It's a very interesting problem to work on, but that compliance layer is… 

IG Just to illustrate that point, even if we just zoom in back on the US, let's put aside all the complexities of the rest of the world. There are thousands of tax jurisdictions in the United States. So depending from where you’re fulfilling and where you are sending to, you need to account for different taxes, the taxes will differ. So just solving that alone, you can build, or people have built, there are existing dedicated companies just solving that problem. And as a merchant, if you have to take on that burden, that is just an impossible task, which is where and why platforms like Shopify really come in and shine. Because for a number of these things, there's only one right answer and we should do our best as a platform to just bake in the right behavior so you don't have to think about it. I just want to focus on selling my barbecue sauce. I don’t want to become a tax expert. 

RD Do you have lawyers/developers on your team? 

IG Oh, we absolutely do. We have all the skill sets. We have in-depth security engineers, PCI compliance experts, tax people. And this is one of the things that makes my job really interesting at Shopify, which is, as I said, the depth of this domain is basically infinite because the amount of pain there is in entrepreneurship is infinite, and Shopify is trying to solve many of those frictions and lower the bar as much as possible.

RD You talked about moving from the HTML and JavaScript on page to a managed sandbox runtime. What does that actually mean for the user and how does that work? 

IG So for the user, it's actually completely invisible. They experience a checkout form like before, but how it's powered under the hood makes a great difference in what kind of promises and what kind of capabilities we can provide to the merchant and also assurances. So we manage the top level page. So if you think about how the page is rendered, the problem or the challenge we have is that we want to isolate any untrusted code, third party code, from the trusted code, which is the first party code, for a variety of reasons. The way you achieve that is you isolate or sandbox all of the third party code, which sounds great in practice, but it's neither simple nor easy to get right. So one of the things we want to provide is the ability for merchants to customize branding. Okay, well, let's build an API and then wait for you to basically provide your custom style sheet so you can apply that. Okay, we built that. Next, what about if I want to provide a custom widget on a form for maybe a better gifting experience or a registry. How do I do that? That's a good example of something that Shopify should support, but may not ever build as a first-party capability because there's just too many of these things. So we built some technology there, it's an open source library called remote DOM where we leverage web workers and we allow partners or third party code to execute within a web worker to build up a tree of DOM elements that are then reflected back into the parent page. But that mirroring is flowing through a vetted channel controlled by Shopify, so we can guarantee that, first of all, you're using our predefined components, the components have the right attributes, they're not going to violate our performance or security guarantees. So that allows us to isolate each one of those. Similarly, checkout is a critical surface for all of the marketing and conversion tracking, which is a massive problem for most merchants because it's both necessary and an unnecessary evil because you just end up with hundreds of kilobytes of code for this stuff. And in a managed runtime, you also isolate all the pixels into their own sandboxes. Getting there means that we have to establish a protocol for all of the events, give the analytics platforms capability to emit custom events and communicate across so you have kind of a pub/sub interface, and also provide access to all the relevant DOM tree events that happen so they can continue performing all the tracking but in an isolated way. And this gives you a really nice property of performance isolation. So now you're running in a web worker so if you have some really nasty loop or you’re just doing something very heavy, you're actually in a completely different process. So that's nice because then the interactive nature of the page is not compromised, which is actually a very common problem. So Core Web Vitals has the metric for INP. That's basically a nonissue for Shopify checkout because all of the third party code is strictly isolated. It also allows us to provide very strict security guarantees because we control the top level page. So if we perform all of the security audits and we maintain all the compliance, then you can rest assured that you don't need other third party tools or vendors to do that, which we’ll actually come back to PCI before and why that's such a big deal. But I think those are the two main ones. 

CW That's all so interesting because, again, that security element outside of all of the compliance things that we've already talked about, you have to think about how can you make enough things run on the client side that you don't have to worry about a server being hammered too much, but you also don't want any sort of client-side scripting to mess with a proper checkout setup.

IG Right, so it also gives you upgrade safety. So because we decoupled these things and we defined the interfaces, we can actually guarantee that we can actually move this element or rename this element or just reimplement it, and you still have a stable target if you need to interact with it or query some data, but you're no longer doing a DOM selector for a thing that might not exist. And in a previous world, that would break. We would actually have to freeze merchants on a particular version and say, “Hey, if you want to move to the newest, shiniest, best version, you have to take some set of manual steps to update your code, to validate that it's not going to be a problem and all the rest,” because the last thing we want to do is break your checkout. This is literally the most critical surface on your entire storefront where money changes hands, so we have to be conservative. So having all of this isolated and behind these interfaces is a really big win. 

CW There was a point where I was building kind of an experiment with headless Shopify and the API. Is there anything unlike the headless API side and the remote DOM side and everything that is kind of in that sandbox that you've talked about that they conflict? Were there any challenges there because of that kind of upgrade path? 

IG No. So for checkout, whether you're building a custom storefront using your favorite front end framework or maybe even mobile app or VR experience, choose whatever tool is best. We provide a set of defaults. And then when you initiate checkout, you will load the web URL, which is the Shopify checkout, or we provide some SDKs for mobile developers as well which will pop a native sheet with an integrated checkout experience. So you don't have to worry about the complexity of checkout, which is really nice and really powerful.

CW That's so cool. 

RD It sounds like you're almost creating another separate shadow DOM with it and I know some frameworks have done that too. Is there a sort of issue in general with direct DOM manipulation that you and everybody else is trying to solve? 

IG So it mostly comes down to what assurances we can provide and how strict we can enforce that. So by isolating this code into a sandbox environment, we control the bridge which allows us to define a protocol and maintain that. And I'll give you a very concrete example of why this matters and where this matters. So I recently mentioned PCI. So for those that are not familiar with checkout, PCI is the payment card industry data standard. And there was PCI v3, which has been the state-of-the-art for about a decade. I believe the v3 was 2013 or so. V4 is coming into effect in April next year, so we have about nine months, and it has a number of changes or updates, but the most impactful one in many respects for web developers is that it tries to address the problem of anti-skimming. So in v3, what was really important is to protect input of sensitive data, so your credit card information effectively. So if you wanted to offer a checkout and you wanted to accept a credit card, you had to build a process where the input of the credit card data would be provably secured and protected from security attacks. And the way the industry has solved that is that we've put all those forms behind iframes. So whenever you're looking at a checkout page, you can bet that whenever you have a credit card input, it's actually iframed, and that iframe is served by a completely distinct service that has all of the PCI compliance checks and balances. Now, that's great except for what happens if the top level page is compromised. Let's say some bad script got in there, supply chain attack or otherwise. What it does is it just wipes the current secure element, the iframe, and replaces it with something else, or it puts an overlay on top. This is a very common attack, which is sometimes referred to as magecart or a skimming attack, and it'll result in credentials being stolen. So v4 actually says, “Well, hold on folks. We actually need to extend the same amount of protections or the same protections to the top level page.” So now you can't just isolate PCI requirements to the iframe, you also have to provide inventory and maintain inventory of every script that executes in your top level page. You have to provide authentication– the right scripts are being executed– and you also have to guarantee integrity. So think of your typical commerce experience where you will put custom scripts, you'll have your marketing department deploy a tag manager which will then bring in a host of unknown trackers that themselves will load other resources. This becomes effectively unmanageable as a compliance requirement. You have to rethink the entire process. And it is also very onerous. How are you going to inventory? How are you going to authenticate that all of these are the right scripts being loaded? How are you going to guarantee the integrity? I don't know how the rest of the industry is going to be solving this. I think this is really, really complicated. But for Shopify, because we control the runtime and control the top level page, we can actually do all the work for you to provide those assurances. We have internal processes that provide the inventory. We know what scripts we will run at that top level context. We know that we've isolated all of the third party content into secure web workers and we control the bridge so we can provide assurances there. We can also perform all the authentication. We can deploy CSP and subresource integrity policies on our contents to ensure that it is all tightly secured. And best of all, this is not retroactive security where you observe after the fact that something has changed, an anomaly has been detected or a new script executed. This is actual runtime enforcement so you can rest assured that this will be correct. And I think the only way to get there really is via managed runtime, which is obviously a big asset for what Shopify offers today, and I suspect where most everyone else will have to trend towards as well in the commerce industry, because otherwise it becomes very hard to manage a compliance requirement. 

CW That's so complex. It's cool, but kind of like what you were saying earlier, it's just infinite complexity, but it's interesting to hear about all of it.

RD When you start combining this with the tracking, it seems like you’ve got to be extra careful.

CW So you talked about remote DOM and I know that Shopify also owns Remix and a bunch of other open source projects, and I really appreciate that Shopify has so much out in open source in general. Is there anything that, in your opinion, could benefit from being more open source at Shopify, or less in any way? Because I feel like open source is both a blessing and a curse depending on the technology that you're working with. 

IG Absolutely. I think there's lots. We've open sourced and contributed a lot to front end technologies. We've been heavy contributors to the Ruby and Rails ecosystem. We've built some really interesting infrastructure technology kind of by necessity. I think we may be managing one of the largest MySQL fleets out there in the world and we built some pretty sophisticated tooling and infrastructure around it. Frankly, I wish we wouldn't have to, but we did and I suspect the world would benefit. I know that the team has had some conversations and thoughts about open sourcing some of that. But if we're to take a step back, I think Toby, who's our founder, part of the genesis story of Shopify was that he was a Rails contributor, he was a Rails core contributor. He didn't set out to build a commerce platform. He was using Rails to build his own store, and then he built a store and he realized that actually the more valuable thing is the software that I built, not the store, so pivot there. So I say that because open source and contributing to open source runs in the DNA of the company. So we couldn't have built Shopify without open source. We know that, we relish that, and we look to contribute to open source in any way possible, whether that's through standards, through direct core contributions, or even sponsoring the relevant projects. 

CW I love that. Cool. 

RD Ilya, was there something you wanted to touch on that we didn't?

IG I think the only thing I'll repeat is just how interesting this space is. There is an infinite amount of pain, as I said, in e-commerce. I think naively when I talk to many other engineers it’s like, “Oh yeah, you guys build a storefront and I guess you run a checkout.” Well, that's true, yes. That's one of the most visible artifacts, but the actual mission of the company is to enable more entrepreneurs to exist on this planet. And that problem is amazingly complicated. How do I open a bank account? How do I open a company? How do I do taxes? How do I do fulfillment? And we're trying to tackle each one of those, which makes for really interesting product and technical challenge. So I've thoroughly enjoyed my time in the space and I encourage folks to also look at commerce broadly, Shopify in particular, and I think it's a space that will continue to grow and be very, very important for the future of our economy. 

RD I think that's right. As more commerce moves online, people will need more and more help.

IG Yeah. Don't bet on commerce online going the other direction.

[music plays]

RD As we do at the end of the show, I'm going to shout out a question. We have not had many Lifeboats this summer, so I'm going to shout out a question about Shopify on Stack Overflow. The question is, “How to activate Shopify web pixel extension on product store.” This was asked by Kaizen five hours ago. So if you can help Kaizen out, get over there. I've been Ryan Donovan. I'm Editor of the blog here at Stack Overflow. You can currently find me on X. Reach out to me if you have questions, concerns, ideas. And if you liked what you heard today, drop a rating and review. It really helps. 

CW I'm Cassidy Williams. You can find me @Cassidoo on most things, or my website, Cassidoo.co. 

IG You can find me at IGrigorik on most platforms. Please feel free to reach out. And thank you for hosting me. I've really enjoyed this conversation. 

RD And we'll talk to you next time. 

CW Bye!

[outro music plays]