The Stack Overflow Podcast

Making ETL pipelines a thing of the past

Episode Summary

On today’s episode we chat with Cassie Shum, VP of Field Engineering at RelationalAI, about her company’s efforts to create what it calls the industry’s first coprocessor for data clouds and language models. The goal is to allow companies to keep all their data where it is today while still tapping into the capabilities of the latest generation of AI tools.

Episode Notes

RelationalAI’s first big partner is Snowflake, meaning customers can now start using their data with GenAI without worrying about the privacy, security, and governance hassle that would come with porting their data to a new cloud provider. The company promises it can also add metadata and a knowledge graph to existing data without pushing it through an ETL pipeline.

You can learn more about the company’s services here.

You can catch up with Cassie on LinkedIn.

Congrats to Stack Overflow user antimirov for earning a lifeboat badge by providing a great answer to the question: 

How do you efficiently compare two sets in Python?

Episode Transcription

[intro music plays]

Ben Popper Hello, everyone. Welcome back to the Stack Overflow Podcast, a place to talk about all things software and technology. I'm your host Ben Popper, joined by two of the greatest: Ryan Donovan, Editor of our blog and newsletter, and Cassidy Williams, a longtime collaborator on the podcast, longtime contributor to our newsletter, and overall software development influencer, I want to say.

Cassidy Williams Oh, goodness. Not that word. But hey! 

BP Hey! So we have a Cassidy/Cassie episode today. We're doubling up. Our guest today is Cassie Shum, who is VP of Field Engineering at RelationalAI. They're working on what they call the industry's first AI coprocessor for data clouds and language models. She spent over a decade at ThoughtWorks working on RelationalAI. And I think for software developers, the key in this conversation is, what are the real gains that can be captured in terms of software engineering productivity? What does that mean for jobs and what does that mean for career tracks? And then, of course everybody in the industry is working on various Gen AI-powered tools and so people want to talk about how you build them and how they run in the background. All right, so without further ado, Cassie, welcome to the Stack Overflow Podcast. 

Cassie Shum Hi. Thank you for having me, everyone. 

BP Of course. So we always ask guests to just give our listeners a quick overview. How did you get into the world of software and technology and what brought you to the role you're at today at RelationalAI? 

CS It's actually an interesting one. I did not start in technology, I didn't start in software. I actually started in biology. For a very long time, I thought I was going to go the biology route and then go into either being a doctor. I didn't want to let my parents down. They all thought, “Okay, doctor material here.” I realized very early on that that was not what I wanted to do. So what does one do with a biology degree? You go into research. And so I went into research for quite some time, really studying Parkinson's and Alzheimer's, a really interesting mental disease for me. But I'll be very, very honest with you. I did not like the feedback loop. The feedback loop was very, very long. So if you're actually running a bunch of experiments, you wouldn't know if they actually worked or not probably for years, if not more. And so while I was doing research, I was introduced to some programs in order to help with some of my research. And I'll be honest, I spent more time trying to understand the programs, trying to understand the software behind it, and trying to improve the software more than I did on my experiments itself. So I used that opportunity to go back and get my Master's in Computer Science, and then after that I said, “No more with research. I'm just going to go dive into the world of technology.” So I had a few software engineering roles to start off with which was wonderful, and then I ended up at a company called ThoughtWorks, which I think gave me the most diverse experience that I’ve had in technology. So therefore, I was there for over a decade, which is very long in software land, but it didn't feel like that because I was essentially in a services organization where we were experiencing all different types of customers, all different types of technology, so in a sense, I had multiple jobs within that decade worth of work. And then with that journey, I was able to learn how to lead engineers, lead organizations. I was an enterprise architect for quite some time, so I’d look at big technology problems that large enterprises had and really dive into solution architecting and all of the different people problems that organizations have. So you name it, I probably had my hands in it at some point. And so a few years ago I decided I didn't really want to be in the services organization anymore. I really wanted to own and be in a company where we owned a product. And so I was very lucky to have met my CEO, Molham Aref, at RelationalAI and he brought me on as the Head of Field Engineering. And so what we really do here from a field engineering point of view is we still have the strong customer touchpoint. We are there to implement our product in the customer environment and so it kind of took a lot of my old world and a lot of the new world with the product and really drive that out to fruition for some of our customers. So it's been a great journey.

BP Cool.

Ryan Donovan So I want to talk about developer productivity and Gen AI. I’ve written a little bit about how generative AI isn't really the thing that's going to boost your productivity and I know you have opinions on this. So what in your opinion is the real way to boost developer productivity? 

CS I think I'd go more to the core first before I talk about what the boost actually is. So what is developer productivity, and I'm sure we could write many articles and talk for probably hours and hours about that. But I think really where it boils down to is where is the value that your engineers and your developers are actually producing? So the value really goes into what your organization is actually trying to accomplish, what features you’re trying to accomplish, what is in front of the customer, and does the customer love your product? And so all of these things I think about as the outcomes that you really have for your engineering organization. We forget that sometimes. A lot of times I see organizations say, “How many lines of code have you written in X amount of time? What is the cycle time? What is all of those things?” These are very good metrics to drive out, but at the end of the day, we need to actually think about are we actually accomplishing the outcomes that we're trying to do? So when I think about Gen AI and when I think about productivity, I think about Gen AI like Copilot. I know Microsoft just announced their Copilot Studio. Things like that I see as more of an enabler of productivity, so how do you actually get to an answer quicker? How do you actually understand over large amounts of data something very, very easy and quick to enable you to do things? But it's not going to give you the actual answer. You yourself are still going to provide the answer, provide the talent. We provide the solutions that you need in order to drive that outcome. Because it is a lot more complicated than saying, “X + Y = Z.” It's actually, “Well, there's this particular thing, and then there's these environmental factors, and then this is actually how we're going to measure the outcome and the productivity.” So there's many different factors where I would actually look at things like Gen AI, Copilot, developer productivity tools, as aids as opposed to that they're actually just going to be the end all be all. So there's a lot of foundational things that we have to keep in mind as engineers. We can't just remove them and say, “Oh, well we don't do that anymore because we have this thing.”

CW I agree with you wholeheartedly on that because I feel like a lot of people when they see these AI tools, they're just like, “Ah, it'll do the work for me. Great,” instead of, “It will help me do my work better.”

CS Absolutely.  

BP So moving away from productivity just for a second, can you tell us a little bit about what the product is that RelationalAI is offering? It’s claiming an industry first here, so I'll let you defend that claim– an AI coprocessor for data clouds and language models. What does that do for clients? What pain point are you addressing? And what are you seeing out in the field as you go out to do engineering with the customers? 

CS Absolutely. So what I will talk about is something that's very exciting that's happening for us as RelationalAI as we're launching this AI coprocessor. We're starting in the data cloud using Snowflake, and so that's one of our first technical partners. Essentially, what we're seeing with customers right now is that we have all these different use cases that they're trying to accomplish. So there are use cases like entity resolution, there's customer 360. There's all these graph analytics and business rules that we're trying to do over a large amount of data. And so what do customers usually do in my experience? What they usually do is they have data that is either siloed in many different areas or it is in a data warehouse like Snowflake. And then they actually say, “Okay, what do we make of this data? What insights are we going to get out of that? What are we going to do with this kind of data?” So what happens is that we start creating a lot of pipelines. We say, “Okay, let's transform the data in such a way where now I can have it run some graph algorithms or run some business rules on it and get some sort of insights.” But what we see is that you're taking data outside and you're actually putting it somewhere else in order to get it into such a shape so you can actually do that. That takes up a lot of time. That takes up a lot of energy. That takes up a lot of people power as well in terms of teams and maintenance around your pipelines and things like that. So what we're very excited to launch in a few weeks here with us is that we actually brought everything and all of what we call AI and analytics and what we call composite AI per Gartner. This encompasses business rules, analytics, predictive analytics, prescriptive analytics, Gen AI. We're trying to bring that closer to the data so you don't have to go through all that transformation and you can actually capitalize on the data that you have within your ecosystem. So with our product today, you will be able to actually install RelationalAI in the Snowflake marketplace. So you don't have to move your data out. The data stays exactly where it is, and now you can actually utilize these AI tools on top of your data in itself and so you don't have to move that anywhere else. And so that's why we call this the first industry standard, because I think we're going to see a wave of, “Let's bring AI to the data, not data out to the AI.” And so I think this is a big inflection point for all of us and I will say we have quite a few customers that we have been piloting and putting in private preview for these things, and their mind is blown. They're like, “Oh my gosh, you mean I don't have to remove everything out of there?” They stay within their governance, they stay within their security boundaries, they stay within their own ecosystem. So you're removing a lot of constraints as well when we think about PII data and things like that. You're still in your ecosystem and you can actually do a lot of these analytics and things. 

CW That's so huge because so many companies nowadays, they're just like, “Well, we want to use AI, but we don't want it to be trained,” or, “Our data's private, so what can you do?” That's constantly the thing that I'm seeing at least where they're just resigning themselves to the fact that they can't use AI because that's not an option for them. So that's really exciting to see that there's a different direction of it. 

CS Absolutely. 

RD Do you think that the sort of direction will make ETL pipelines a thing of the past? 

CS I hope so. That's a strong statement, but actually we are looking more to what we call ‘modernize the data stack.’ So what I look at from traditional enterprise systems and things like that is that we have a ways to go because we have legacy data that's just floating all over the place. So that is not going to go away anytime soon, but the movement should be going into thinking about the modern data stack. How do you actually have all of your data in one place, but also in one place but understanding what that data is. And so a lot about what RelationalAI does is we're built on a knowledge graph as well. So think about when you actually move data around, you're not moving a lot of that metadata, you're not moving semantics, you're not moving the rules around it. And so if you think about a knowledge graph in itself, we're actually adding all of those things on top of your data as well. So there's a lot of different concepts that we think about when we think about the modern data stack. That's also one of them as well. 

BP It's interesting to hear you say that. I think one of the things we've been thinking about at Stack Overflow is what is the importance of data quality to the performance of your AI and what's the best way to go about organizing and optimizing that data for performance, accuracy, latency, all those kinds of things. So when you say it sits on top of a knowledge graph, you mean that you can look at the data and then add, like you said, tag semantic understanding metadata to it or you're taking what the customer has already done and you're just bringing that along? 

CS We're taking what the customer has already done. In some instances, for example, if we have a large amount of data where a lot of the semantics and the rules are in separate microservices outside, you can actually bring some of those more simplistic rules and semantics down closer to your data. So that's one example that we do. I always use this example and I know folks in my company do, but if you actually are thinking about calculating your age, you don't store your age in data. You store probably your birthday and then knowledge outside knows what today is. And so that small calculation or that small semantic, you don't really want to hold that in multiple microservices because engineers, we can make mistakes. We can actually code something wrong and whatnot. And you can do it in multiple different places. Those are kind of core semantics that should probably belong closer to your data so you can always actually get your age from one place. And so actually the core competency of what our knowledge graph does is actually bringing some of that really strong business logic back down closer to your data so you have it in one place and it's not duplicated everywhere, the logic's not duplicated everywhere. And we're seeing some very, very powerful things right now with that. 

BP So I think one of the things that, again, thinking about what we've been doing at Stack Overflow and where we're sort of interested in is, how do you ensure that the data the AI is drawing on, whether that's for its training, it's fine-tuning, or what seems to be the best practice emerging for many– a RAG system where you take all the great capabilities of the latest foundational model but you point it at your own specific data set. And that would be probably what RelationalAI would be working with– the company's internal knowledge base or code base or whatever setup they have that they want to draw on. But then for ensuring the accuracy and the freshness of the data, at RelationalAI, what do you advise clients? Let's say somebody is moving from an old system where they have documents, they have wikis, they have code comments, they have Jira tickets. How do they pull all that stuff in together so that they can make the most of it without hallucinations or inaccuracies?

CS One of the things that our company is working on right now is that specificity of the domain. So what we're actually trying to provide to our customers, not only we actually are providing this tool, but a lot of my field engineering team is comprised of what we call knowledge engineers. So they're very good at modeling different concepts within organizations. So if you are in a Telco, then they're able to model all of the different relationships within that ecosystem and help bring in and weave in data, their data that they've had, into some of these models and these concepts. It gives more fidelity. It gives more of that domain-driven design thinking so you have more information about your data in itself. And with that, then I think the rest of it actually just comes a bit naturally when we think about RAG. Because the core foundational point about RAG is actually to focus specifically on what you're actually looking for, to your point, so we can remove hallucinations and things like that and get a lot more preciseness with that. But in the end of the day, whether it's a knowledge graph or whether it's your data itself, it has to be specific to you and your business problem and your domain. And so a lot of that is around cleanliness. It's around doing the hard work of making sure your data is clean, is reflecting your organization, your business problems, your business processes. And if it's not doing that, then you can apply all the Gen AI in the world on there and it's not going to give you exactly what you need. So there are foundational bits that need to be done in order to apply that.

CW That makes a lot of sense. I kind of want to ask a little bit more just about your day to day and stuff working with customers and stuff and field engineering and everything, because I do think that a lot of our listeners probably know the concept and solutions architecture and stuff, but what does that look like when you're working on a product, especially something like this that is very new and shiny to customers and both on the selling side but and on the technical side. 

CS Oh, boy. I'm not sure we have enough time for all that, but I'll try to summarize a day in the life. So my day to day differs every day. Every day is different. And the reason being is because I deal with multiple customers. We have lots of different customers in different industries essentially testing out our products. So the way I think about field engineering is that we are the folks who know a lot about the product, who work very closely with our product engineering team, and then we also understand the ecosystems of each of our customers. So somebody may be in a financial institution where, oh my gosh, PII data and security is actually a big concern, so really thinking through what that architecture looks like. I have customers who actually have maybe about 3,000 microservices that they're trying to decouple from or detangle from. And so each architecture is a bit different with our customers, so how do you fit something like our product in there? So I think we're in a phase of our company where we are trying to battle test our product in our different customer environments to see what actually works the best and get that instant feedback loop from our customer. And the only way we're going to get that feedback loop is being on the ground with them so we can see how they use our product, how effective it is, use cases that they're using it for, and then how do we optimize for that? So there's a lot of performance and security and abilities that we look at in order to stress test our product as well. And so all of these things are input into our product team to say, “Okay, these are the things that we're seeing on the ground, and now we have to prioritize what do we want our product to be and who do we want to focus on?” And so it's sort of this continuous feedback loop. So we feedback loop to our customers and we say, “Customers, you are part of our design program. You're part of our preview program. You're going to see some bumpy roads ahead, but you also get the chance to tell us what we should do to improve our product in itself.” So that feedback loop actually is just consistent and so I see field engineering being smack dab in the middle of that because one of our primary goals is to still delight the customer. No customer is going to buy a product if it's too hard to do. So it's really about making that journey easy for them, but at the same time, we're evolving the products so eventually –knock on wood– I don't have to white glove any of these customers anymore and then our products mature enough to just be front and center. So that is the evolution that I'm driving towards. So I would say that my day to day is different, but my goal is probably to work myself out of a job, really.

CW I was just going to say, so you're basically trying to get rid of yourself eventually? 

CS Yeah, I would call that success in the future as well. There will always be customer support and things like that, but this really close hand-holding mentality I want to start getting away from because that shows maturity of your product, the self-service nature of it, the repeatability of it, and so we're in this stage where we're moving over to that. But this part is so crucial for us because that feedback mechanism is what's going to inform your product and so we need that. I think every company, every startup, any product company always thinks about that. Amazon is always customer centric because they're always taking that feedback into their products. 

CW It's like user testing, but real life where it's not in a silo. It's what's actually happening, like your title, in the field. 

RD That sounds like you and your team have to be conversant in a lot of different technologies. 

CS That’s right.

RD How do you skill up your people on new technologies they come across? 

CS Well, it's not so much about skilling up my own people. So my field engineering team is comprised of a lot of specialists, actually, that represent a lot of the AI workloads that we do. So if we think about graph analytics, I have a graph engineering team. If we think about prescriptive analytics, I have an OR team. If we think about predictive analytics, I have a data science team. So actually, field engineering is more of the specialist for me. The way we actually implement in our customer set and really go out there is actually through our partners as well. So I target partners, and ThoughtWorks being one of our partners, who actually have that breadth and skills of all the different customers and ecosystems. So there is a big partnership around this because we have folks who really understand the customer architecture and that ecosystem and the technology, as you said, Ryan, but then I have my PEs that come in and partner with them to figure out, “Okay, how do we actually work together to embed the product in to show that business value?” So we have big teams that actually represent that. 

BP Cassie, is there anything we haven't talked about that you wanted to touch on? More specifically, anything that you've worked on recently you think would be interesting or anything in terms of the value proposition we didn't get to? 

CS The only thing I would say is just as a resounding theme of what I always talk about– always try to figure out what your outcome is. If you back out from your outcome and work backwards from that, then applying RelationalAI, applying Gen AI, applying anything that you believe is going to assist you. Never lose sight of that outcome that you're trying to drive to. And I know this is probably a pretty resounding theme just in general, but it's very easy to do. Once you get into the weeds of solving problems and once you get into the weeds of, “Oh, I'm going to do this and this and this,” it's very easy to lose sight of that outcome. So being able to remind yourself and make sure that everything that you're doing is actually driving to that is very important. I think about that in engineering teams, I think about that for product teams. I think about that for everyone. 

CW I've kind of just a fun and zany question. Because AI is everywhere right now and everybody's doing it from the business side to the fun side, for you, what has been the most fun, exciting use of AI that's made you be like, “Oh, this is really, really cool. This is going to change things.”

CS I find it really fun. I've been using ChatGPT and the prompting, I've been using it to write a lot of recommendation letters. I've been using it to do all the things that I don't want to do and I think that's pretty fun. These are things that are just menial tasks to me. One of my friends asked for a recommendation for their condo board, and I just looked at them and I said, “Are you kidding me? It's going to take me so long to do this.” I'm so bad at those things. But I had so much fun just throwing in, “This is how long we were friends. This is when we worked together. This is the apartment that they want to apply for,” and it spit it right out and I was like, “Okay, that was fun.” Because A, I didn't have to do the thing I didn't want to do. B, it was just so powerful now that it did kind of sound like me. And C, my friend got the condo. 

CW Hey, you got your friend a house with AI.

CS Exactly.

[music plays]

BP All right, everybody. It is that time of the show. Time to shout out a community member who came on the Stack Overflow platform, helped to save a question from the dustbin of history with a great answer. Antimirov, awarded 23 hours ago, a Lifeboat Badge. Gave a great answer which now has a score of 20 or more on a question that had a score of three or less. “How to efficiently compare two sets in Python.” So thanks so much to Antimirov, a great answer here with 22 upvotes, and we've helped 63,000 people who had a similar question. As always, I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can find me on X @BenPopper. Email us with questions or suggestions. We have brought listeners on as guests. We have done episodes with guests suggested by listeners. We have discussed topics that listeners want to hear about. We probably have ignored the listeners who told us to talk about certain things less, but we take it to heart. And I think we even have some listeners who wrote in and might be writing for our blog. So all those good things, you can participate in the community and be part of the podcast. Hit us up, podcast@stackoverflow.com. And last but not least, if you enjoyed today's program, leave us a rating and a review. It really helps. 

RD My name is Ryan Donovan. I edit the blog here at Stack Overflow. You can find the blog at stackoverflow.blog. And if you want to reach out to me with whatever, you can find me on X @RThoronovan. 

CW I'm Cassidy Williams. I'm CTO over at Contenda, and you can find me @Cassidoo on most things. 

CS I'm Cassie Shum. I'm the VP of Field Engineering for RelationalAI. You can find me at CassieND on X or LinkedIn. And for more information about RelationalAI, we are doing an overhaul of our website. So in a few weeks you'll see new content, but it'll be at relational.ai. 

BP Wonderful. Thanks for listening, and we'll talk to you soon.

[outro music plays]