The Stack Overflow Podcast

Inside Intuit's generative AI system, GenOS

Episode Summary

SPONSORED BY INTUIT In today’s episode of the podcast, sponsored by Intuit, Ben and Ryan talk with Shivang Shah, Chief Architect at Intuit Mailchimp, and Merrin Kurian, Principal Engineer and AI Platform Architect at Intuit. They discuss generative AI at Intuit, GenOS (the generative AI operating system that they built), and how GenAI can scale without sacrificing privacy.

Episode Notes

Intuit shares more about their generative AI operating system (GenOS) in this Medium blog.

Learn more about Intuit technology here.

Congrats to Lifeboat badge winner Mohsin Naeem for answering the question How can I extract metadata from an MP3 file?

Connect with Merrin on LinkedIn or X.

Connect with Shivang on LinkedIn.

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I am your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by the maestro of our blog, our newsletter impresario, Ryan Donovan. How's it going, Ryan?

Ryan Donovan Oh, pretty good. What's the news, Ben?

BP So today’s episode is brought to us by Intuit. You may know them from companies like TurboTax, Credit Karma, QuickBooks, Mailchimp, lots of popular consumer-facing businesses there. I think Intuit might've been one of the first clients that I worked with when I came to Stack Overflow way back in 2019 when you and I were still at the New York City office. I did an interview that went up on the blog and the topic was about AI and a bit of how that was being integrated into some of their services. Five years have passed and a lot has changed, but we are still often talking about AI, which has become in some ways a sort of operating system that everybody's trying to bring on board. So we have two great guests today. I'm going to let them introduce themselves, and then we're going to dive into a little bit of what they've been working on with something called GenOS. So without further ado, Shivang, do you want to introduce yourself?

Shivang Shah Absolutely. Thanks, Ben. Hey, everyone. This is Shivang. I'm the Chief Architect for Intuit MailChimp. I’ve been in the industry for a while, I’d say about 12 to 15 years approximately. I've been with Intuit for about seven years now, I started off as a staff engineer. Pleasure to be here.

Merrin Kurian Hi, everyone. I'm Merrin Kurian, Principal Engineer and currently AI Platform Architect at Intuit. In this role, I am responsible for both the ML platform for our classical AI, the model development and serving, and also GenOS, which is what we are here to talk about, which is our paved path for generative AI app development at Intuit. I've been with Intuit for 15 years, and I've worked with Shivang in the past when we were both in QuickBooks group. The last three years I've been in the AI org.

RD So like Ben said, you all have been talking with us about AI for a while. You were on the podcast talking about your engineering program and AI, so how foundational is AI to the way you work and the products you build?

MK So Intuit's mission is to power prosperity around the world. We deliver that by solving the most important financial challenges for our 100 million customers, and all that we do at Intuit is in service of less work, more money, and complete confidence for our customers. So getting to the AI aspect, as you said, five years ago, Intuit declared its strategy to be the AI driven expert platform, thereby making data and AI central, core to our strategy. So what that means is that we don't use AI just to identify opportunities to cut costs, for example, marketing spend or estimate risk in financial transactions, but we have a well-defined product strategy that incorporates AI into key customer workflows. So let me give you a few examples. We have our document understanding capability which automatically extracts information from forms and documents, and this is leveraged by receipts in QuickBooks to do transaction categorization. It also works for tax forms in TurboTax. Another great example is our personalized models for doing transaction categorization in QuickBooks. So it helps with minimal work on the business owner's side to keep their books clean and accounting trackable. All of these are great examples of us saving time for our users by automating tasks leveraging AI. Now, I want to share a few statistics to show the extent of our investments in this area. We have our ML platform which do 810 million AI-driven customer interactions per year, 65 billion machine learning predictions per day, 25 million conversations with natural language per year, two million models run in production per day. And we have about 900 AI, machine learning, and data science US patents. And just to add a little bit more before we get into the generative AI aspect, traditionally we have been investing in knowledge engineering, machine learning, and natural language processing, so Gen AI is a new dimension we are adding to our AI capabilities.

BP Nice. So you said that there's on the order of tens of millions per day. What is that coming from? Can we break that down for your average set of suite of customers maybe across a couple industries? Those are things they barely even notice that's just part of a predictive suggestion or something that is assessing a document and then offering it back to them cleaned up? Where does such a high volume of AI interactions happen and how light of a touch is it really on the customer end?

MK Have you uploaded your W2 or any of your 1099s to TurboTax and seen that it automatically fills the forms and you don't manually enter? That is one, and in QuickBooks, if you do a few transaction categorizations on your own, it will learn what you expect these transactions to be categorized. You will never ever have to repeat the same kind of pay categorization ever because the AI learns from the few examples that you gave. And it's personalized, so based on your company's specific profile, the model learns and then it repeats.

BP Got it. That makes sense.

MK And then there is examples of cash flow forecasting. It uses several techniques looking at all your finances. If you go to cash flow forecasting in QuickBooks, you can see that it actually predicts your cash flow. So all of these are powered by AI.

BP Good. Those are good examples.

SS From a MailChimp domain perspective and in the marketing domain, AI has been the topic of conversation for us right now. We know for a fact that 90% of the marketers know that they need to leverage AI, but only 50% of them are actually kind of getting started with AI in general. Intuit Mailchimp has made some very early on strategic acquisitions in this AI space. We did the Sawa acquisition in 2019, and Inspector 6 I think was somewhere around the 2020 time frame. Both of these acquisitions kind of held the balance of AI with marketing domain expertise, and since then we have focused on making AI work for customers for quite some time now. Ben, as you said, it's part of their day-to-day activity on maybe sending an email. We have models that help you with send time optimization which is very critical from a marketing domain perspective. We have product recommendations. When you're designing your email, we are able to recommend you as to what product you should put as a part of your campaign that will essentially get you a high clickthrough rate, for example. And Gen AI just starts pushing the limits of it because in the past, I'd say, year or so, there's been a Cambrian explosion of what you can do with Gen AI in the creative content creation space. It's just been amazing.

BP Cool. That all makes sense. So the kind of AI that we talked about before would be something like its ability to read a document and then translate and understand where it needs to go in a form fill or do financial modeling, predictive cashflow. The hot topic these days is GenOS. Maybe you are for the marketing side, but I don't think you're writing a lot of poetry or generating a lot of images for your business customers, so let's jump into that. For folks who haven't heard of it, what is GenOS? What do you mean by generative operating system? And now that you built this for Intuit’s engineering org, how are folks using it?

MK Let me add some context. We have been on this journey to democratize AI so that all of our technologists, not just data scientists, can leverage the power of AI and accelerate its adoption into Intuit products. So three years ago when I joined this org, that was the exciting part about my role. I led the development of the first no-code solution, we call it Model Studio in our AI marketplace, where you just upload or bring your training data and the solution will do the train, test, deploy, evaluation, monitoring. The whole MLOps is no code available to you. Data scientists typically do this, but analysts found great value because they don't have the engineering capacity to do the rest of the MLOps. They are great at working with data, so that's where we started our investments in democratizing AI, for not just data scientists, but everybody. So this is just the latest in a series of investments, so GenOS is the next wave of democratizing AI. Generative AI itself has helped democratization a lot. In typical or classical AI, you will need to come with a featurization pipeline, you need to identify features. Here, the models themselves are so large, they have so much knowledge with them, that you just need to instruct them with prompts and responses, which most people can do. So the appetite to leverage Gen AI in all possible experiences inside Intuit products, inside employee-facing internal products, that was huge. From that need came GenOS. So you need one paved path, one way to develop Gen AI applications so it helps streamline. You can institutionalize the combined knowledge of the organization. It gives you good abstractions so you don't have to implement everything from scratch when you build applications. So that's the idea of GenOS. It's really the operating system, if you will, and we can draw parallels to a typical operating system, but before that, I just want to touch upon three main challenges. One, yes, velocity and the acceleration of product development and the scale we want to have for the whole company by building GenOS. The second is LLMs are off the shelf commercial. They're great, but they don't have Intuit's domain knowledge, so we still won't be able to solve customer problems just by sending prompts. So we have to overcome those limitations again. There are well documented techniques like retrieval augmented generation or grounding the prompts with your domain capabilities. So we bring all of that under the GenOS umbrella as a paved path. So every team doesn't have to go learn all the different techniques. They can just bring in their domain capabilities as plugins and then bring into the ecosystem and then it will just get to work. And the last and the most importantly, we have our established responsible AI and data governance principles and practices at Intuit. We want to make sure that everybody in the company who's developing Gen AI will adhere to these. So what better way than to centralize all of these and build good abstraction so teams don't have to worry about this and it will all be taken care of for them.

BP Shivang, maybe you want to add to that. If I'm an engineer listening to this, what is the tech stack here? What is it built on? And if I'm interacting with it on a day-to-day basis as Merrin said, are we talking just drag and drop no-code/low-code or also the ability to maybe get your hands dirty when you want to?

SS That's actually a great question. I’ll start off with an analogy. I have young kids at home, so this is the analogy that comes to my mind. GenOS is like a Lego table and big Lego sets available to it. You have square Legos, you have rectangle Legos, you have triangle Legos, and the Lego table itself. Depending on the domain that you are in, depending on the kind of use case you're trying to go after, you can not only pick and choose the right Lego piece, but at the same time, you can just put it on the same table so essentially other people can reuse the same structures that you have built. So from a consumer perspective, the loosely coupled system and more extensible and pluggable system is what you're generally looking after. And especially to answer your question, it's a combination of two buckets. One– reuse what's already available. A lot of these features like security guidelines, data policies, data governance guidelines, the abstraction or different types of LLMs, all of these things are already available in the GenOS. Those Lego pieces are already there. As a consumer, you're just putting them together. But at the same time, you also get into a world of domain specific use cases where you do need to indeed build your own Lego piece that can essentially plug on top of your Lego table. That is comparatively easy to do because, guess what, the contract is already there. You can always have a 2x2, 4x4, 10x10 structure, so it's very easy to contribute, it's very easy to inner source. And MailChimp actually had product use cases where we did actively contribute back to GenOS specifically around the management of prompts, management of prompt engineering, governance of prompts, etc.

RD So the base of any generative AI is the large language model. Are you all using existing large language models or did you build your own? And if you built your own, why?

MK We do both. As I said, it's not sufficient to have off the shelf, and there are reasons. One– we have been investing in NLP as a specific niche area in AI and there was always this need for language models, it's just that the language models happen to be large of late. So we have all these investments made in NLP areas where teams have been building language models specific to domains, custom training them on finance data or tax data or personal finance data to solve domain specific problems. So there is this need to also manage cost, latency, accuracy, which you cannot manage on your own if you just buy off the shelf commercial LLMs. So we use a mix of both depending on what the situation warrants, but I think our strategy is to always use the best one to solve our customers needs, whichever that LLM is.

SS I'll just add onto that, the kind of knowledge and features that we want to provide to our customers can be kind of bucketed approximately in three buckets. The topmost level is essentially the general knowledge about the domain. What is marketing, how does it work, et cetera, which any LLM open source can actually generally provide you with that information based off of what it's learned on. The second bucket, the second layer here is going to be, “Okay, what are the best practices of this domain?” And MailChimp, for example, has been a market leader for the marketing and email marketing domain for quite some time now, so that's the data that we essentially want to start using in order to train these language models and be able to provide those best practices to our customers. And then finally, the third bucket is now getting into the world of, now that you have provided general knowledge, now that you have provided best practices, how do you personalize this data for your customers specific to their use cases? Not just the generic, “Oh, I'm going to send an email” use case, but more like, “I'm trying to send this Thanksgiving campaign. What do I need to do?” And that's where the real fine-tuning of the LLMs will come into play.

BP That makes sense. So this seems like a huge project. You had talked about doing it before so I'm sure you had teams involved, but give us a sense of the scale here, both in terms of the software you have running internally, I know we talked about the number of external interactions you have per day per year, and then also what does it take to manage something like this on an ongoing basis?

MK Let me just talk about the architecture of the internal components of GenOS itself so you can see what is involved. We have a layer called GenUX, which is specifically created for interactions and rendering for Gen AI experiences. Then we have GenRuntime, which includes the resources to run applications on GenOS. It also provides services such as Planner, which is like a scheduler. You create a plan to execute a request and then there is an executor which actually executes the request. Then additional services we offer are memory, long term storage, and of course the LLMs themselves. And then we have the pluggable system of domain specific capabilities, like agents and tools. And then we also have, obviously, role based and use case based access controls to ensure the right level of access across the system. Then we have all the guardrails that we talked about, and then it comes with out of the box observability, governance, and cost attribution. This is the runtime. We have a design time or developer time tool called GenStudio where you experiment with your prompts, you get feedback, you can evaluate and optimize. We also have integrated controls to test and detect potential issues and also evaluation frameworks to help you measure your accuracy of the GenOS application itself. So these are all the different components that we have just to help you understand what is GenOS. Now, how do we build something like this? I can break it down to people, process, technology. We looked at what technology is involved and we have been fortunate to have the leadership buy-in at the highest level. CTO and CTO staff is directly involved and provide directional guidance. So it starts from there, that way we are set up for success. We have good processes, and if I go into the level of depth, we have dozens of mission teams who came together in this huge mission across the company to make this happen, and we have check-ins and processes at all levels, starting from CTO down to the scrum teams. Again, the goal is to ensure teams have the right resources and they are prioritizing on the right things. We have well-defined forums where we come together to review design and architecture and analyze and troubleshoot to ensure transparency and accountability, because this is new and everybody is moving at breakneck speed and we all need to collectively bring in our knowledge. We cannot wait for one team to go in isolation and build something. We need the collective learning and understanding of everybody in the company to come into one unified system. And then what really helped is really fast decision making. We have clear escalation paths, exception processes, and I can tell you in my 15 years at Intuit, I have never seen something at this scale moving at this pace. I have seen very slowly executed large projects, but such a large project at such pace I haven't seen before. So it's all thanks to the processes and the organizational culture that led us here. Now, the other aspect is that we have had good investments around platform AI and data leading up to where we are today which enabled all of this. And I want to enforce this message– I'll be the first to acknowledge that however large a mission team you have, you will still not be able to solve everybody's problem. We have been very fortunate to get good contributions back from our users to GenOS, like Shivang said. We have had special interest groups organically formed around solving common problems and then contribute back to the GenOS. And we continue to ensure that GenOS is extensible for our customers. Like I said, we will prioritize the key capabilities that we need to build, but then we cannot solve for everybody so we will have a pluggable system which allows for customization, whether it's UI, whether it's agents and tools, whether it is even the guardrails we want to enforce. So together, all of these should help us scale and continue to evolve.

BP Nice.

SS I'll also double down on what Merrin mentioned. The first one would essentially be that building this fast and at scale was possible because a lot of these foundational capabilities already existed within Intuit's ecosystem. We didn't have to start from scratch. A lot of these things were already there based off of all the investments they have already done in the past 5-10 years. So that definitely helped us with getting the velocity that we needed in order to build this fast. The second aspect of this is essentially the process which Merrin mentioned. When you have literally a whole organization coming together and driving this at, as Merrin said, breakneck speeds, the process of determining what is fixed, what is flexible and what is free is very important. There are things that you don't want to compromise on. There are things that you can compromise based off of exceptions, based off of rapid experimentations, based off of customer learnings, and there are some things that are just free for people to go and build on top. So those processes essentially helped us shape the architecture and the technology in a way where, “Hey, you know what? You're just going to go ahead and build an API? Fine, go ahead and do it. Oh, wait, are you about to add an agent or a new executor to the system? No, wait, talk to Merrin first. Make sure we actually have a deep architecture around it before we can kind of just assume that this is going to work and start forward.” So from that perspective, I think as Merrin mentioned, people, process, technology, the organization kind of coming together and building this was very critical.

RD So we've talked to a lot of people building Gen AI tools, but when you get to the enterprise level and start thinking about reliability, privacy of data, and especially since you're working with financial data and PII in the marketing realm, how do you scale it up so it's reliable and also protects all of the data that your customers and you all have in there?

MK So it is about our consistent investments in the past that has helped us gain momentum. As Shivang said, some of these capabilities already existed, some we had to invent, some we had to extend. And all we had to do is bring all of these together under one umbrella. I'll give concrete examples. For example, we have a well defined data governance strategy, and not just strategy, we have actually solutions that work. The entire tech community is trained on how to handle data. Likewise, we have well established responsible AI principles and practices, but that used to be very limited to the AI community. What had to change in this case was because now everybody in the company is dealing with Gen AI, the responsible AI training had to be done across the company. So the first thing you would do when you come to GenStudio is actually take the responsible AI training, and responsible AI itself got an update considering Gen AI and LLMs. So all of those paved the way to make it easier for us to scale while still protecting data. As you said, we have compliance, highly regulated industries, and this varies, and there is only one GenOS that has to cater to every possible compliance. So we have accounting, we have tax, we have personal finance, we have email history from MailChimp. So all of these have their own nuances of security, legal, privacy, and data handling, and if we leave it to individual teams to solve it on their own, we just don't have enough visibility or governance so we decided to take it on our own. My first two months working with GenOS was simply to collaborate with our security legal compliance partners, understand all the requirements, ensure that it is baked into GenOS. That way we have all the policies, practices, and training before we even open up, before we onboarded the first use case to GenOS, and that continues even today. I wouldn't say our work is done, but the processes are in place. The policies are in place. All we need is to invest more in automation and make it easier and easier for teams to leverage. And this is key to acceleration. Once all of these are centrally managed and come out of the box for users, they don't have to worry about any of these while they build their applications. They just have to focus on the business value that they can deliver to their customers. Now I can take some other examples on capabilities that we leverage. We have authorization systems which go into very fine grained access controls. We had to just develop additional controls there, but we could leverage the existing authorization capabilities attained to it. But what changed really is the difference in the type of data we are dealing with. So traditionally, Intuit would deal mostly with structured data: finance, tax, accounting, personal finance. The nature of the data is well understood. These are all structured. The only big difference from where we were to where we are now is it's mostly unstructured. With prompts, we can define structure, but LLMs can generate any kind of output so we need to define a structure for the response and then our existing mechanisms would work. So this is the kind of technologies we had to develop, but otherwise I would say all the pieces were there. We just had to assemble it all under one umbrella and provide good abstractions so teams can just accelerate on their development.

BP Nice. Shivang, do you want to add to that? And again, I think the more concrete you can be, the better from the developer experience perspective.

SS Intuit had a unified data architecture for more than a decade now, as long as I can remember, and parts of having a unified data architecture is not just around how the data moves or how the data stores, but also how the data is leveraged, how the data is governed, the toolings around it, the observability around it, and the automation around it. All of these things kind of already existed in a very loosely coupled fashion as a part of our foundational capabilities, and what Merrin talked about is essentially adding those extensions to those systems so that we can essentially do Gen AI more concretely. So I'll give some examples here. For example, we had very specific use cases for marketing domain where we have to have the right level of governance on the data that we essentially leverage for sending recommendations for the customers by the Gen AI GenOS systems. It was very easy for us to essentially plug a data policy that says, “Do not retain this data for X amount of time.” It was probably a matter of hours, if not days, to just go ahead and put it in there and say, “Okay, the data is not going to retain itself. You're done. Once the request response happens we are not worried about the governance from that point onwards.” So that's just one of the examples. There are some other security related examples that we can just add the policies to an ongoing framework of security policies, and it's just somewhat low-code/no-code. You go and you add some YAML files and you're essentially done. So from a consumer perspective, it was very easy to plug your custom policies in from a data and a security perspective.

BP Great. So Merrin, I heard you mention a phrase somewhere in the podcast that really resonated with me, which is that everything feels like it's moving at a breakneck pace in the world of Gen AI. So looking ahead without giving away any of the road map, what are you excited about? I'm sure you've got some ambitious ideas when it comes to either GenOS or just some of the new capabilities, the emergent stuff that's happening in AI, what are you excited about either as an engineer or from the customer perspective?

MK One thing we are learning quickly is however fast we move, we are not moving fast enough. So we are doubling down on our rapid experimentation track to enable more than what we already provide. Right now we have very well defined guardrails, and then we let people in only after we have explored, experimented, and ensured that everything is airtight. But we are learning that we should open up more avenues for people to explore any technology they would want, provided they don't put any sensitive data out there. So that's something we are doubling down on. That way, as a core team, as the operating system team, we don't have to invest in every single direction. These teams who are exploring the technology will bring back the learnings and then we can prioritize and then embed it as part of the operating system.

SS Similar sentiments here. Rapid experimentation right now is the name of the game. Learning from our customers how they want to leverage AI, specifically Gen AI, is essentially right now the most critical thing for us. And based off of that, how do we actually enable this rapid experimentation? What capabilities we need to build on GenOS is essentially top of mind for us right now.

BP Nice.

[music plays]

BP All right, everybody. It is that time of the show. Let's shout out someone who came on Stack Overflow and shared a little knowledge and helped the community. Awarded November 16th to Mohsin Naeem, “How can I extract metadata from an MP3 file?” Helped over 18,000 people. I'd like to get my old MP3 collection back together. I want to say to everybody, thanks for listening. As always, I am Ben Popper. You can reach me @BenPopper on X. If you have questions or suggestions for the show, hit us up– podcast@stackoverflow.com. And if you enjoyed the program, leave us a rating and a review, because it really helps.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And if you want to reach out to me on X, my handle is @RThorDonovan.

MK So my name is Merrin Kurian. You can find me on LinkedIn– Merrin Kurian, or X @MerrinKurian. And we have details about GenOS in our Intuit engineering blog on Medium.

SS This is Shivang. You can find me on LinkedIn– ShivangShah15. Reach out to me directly if you want to learn more about MailChimp or marketing use cases on Gen AI.

BP Nice. All right, everybody. Thanks for listening. We'll be sure to put those links in the show notes if you want to check out more.

[outro music plays]