Ben and Ryan are joined by Matt Zeiler, founder and CEO of Clarifai, an AI workflow orchestration platform. They talk about how the transformer architecture supplanted convolutional neural networks in AI applications, the infrastructure required for AI implementation, the implications of regulating AI, and the value of synthetic data.
Clarifai is a developer-friendly AI workflow orchestration platform built to help devs integrate AI into technical workflows and customer experiences.
We’ve written about best practices for integrating AI tools into your workflows.
Connect with Matt on LinkedIn or via his website. You can also read his posts on the Clarifai blog.
Well-deserved congrats to Stack Overflow user Jay Wick, who earned a Populist badge by explaining how to Get image preview before uploading in React.
[intro music plays]
Ben Popper Maximize cloud efficiency with DoiT, an AWS Premier Partner. Let DoiT guide you from cloud planning to production. With over 2,000 AWS customer launches and more than 400 AWS certifications, DoiT empowers you to get the most from your cloud investment. Learn more at doit.com. DoiT.
BP Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my colleague and collaborator, Ryan Donovan, Editor of our blog, maestro of the newsletter, man about town, technical writer.
Ryan Donovan Yes, sir.
BP Ryan, you and I have been doing podcasts talking about Gen AI since the world grew fascinated with this stuff in November 2022, but I remember last year talking with people and they were just kind of like, “Most of this stuff is just a test at the moment. Most of this stuff is internal at the moment. Most of this stuff is not in production at the moment.” And I think today is going to be an interesting conversation about what it takes to go from internal testing to having it in production where you can feel like you can trust it to scaling it in production, if that's something you need.
RD And I think also to talk about what was AI before the LLMs came over and took all the news away.
BP Right, because a lot of that stuff I think is still happening, it's just not getting the attention. Exactly. All right, well I'm excited to welcome our guest– Matthew Zeiler from Clarifai. Matt, welcome to the Stack Overflow Podcast.
Matthew Zeiler Hello. Thanks for having me.
BP So you are founder and CEO over at Clarifai. For the folks listening at home, just give them a quick rundown of how you got into the world of computer science and then AI and what led you to founding and running your own company.
MZ Sure. So it was really back at University of Toronto where I was doing undergrad. I took this program called Engineering Science, and you take every discipline of different engineering– civil, mechanical, computer, et cetera– for the first two years and then you specialize in the last two. And when I was deciding which option to specialize in, I happened to run into one of Geoff Hinton's PhD students who showed me his research on AI neural networks back in– this must have been 2007. I was hooked at that point because he showed me a video of a flame flickering and he said it was completely generated by neural nets. So I decided to dive into the computer option and ended up doing my undergrad thesis with Geoff Hinton, which was quite the honor. And then, having a taste of AI, I knew I had to dive even deeper and do a PhD in it. That's what brought me to New York University, and there I was working with Rob Fergus as my PhD advisor and in Yann LeCun's labs, another pioneer in the field. I also spent some time over at Google in their Brain team with Jeff Dean and others, and that was a great experience to learn kind of proper software development in an organization, but it's also where I saw that what I was doing in my PhD was working better than what Google had at the time. And so I saw that as an opportunity to start a company around this technology and get AI out there to developers. And so in November of 2013 I founded Clarifai, and about three weeks after incorporating, I ended up winning ImageNet, which was the biggest computer vision competition at the time and that really put us on the map. Investors were looking at that that year and we had some great early investors like Google, Qualcomm, and NVIDIA, even in our seed round. And so that was kind of the genesis, and now 10 years in the making, we have a lot more features and functionality, much broader than computer vision where we started back in 2013.
BP Nice.
RD Back in the day, I'm sure when you started it was the convolutional/deconvolutional neural networks. I know those aren't talked about as much lately. Are they still part of computer vision? If so, what are the advancements?
MZ For sure. So all the rage is transformers now, which is a different type of architecture than convolutional neural networks, but they have pros and cons. Transformers seem to be much more flexible and maybe more powerful of a learning engine, but with convolutional neural networks, they're better understood, you can make them much more efficient, you can run them in low power environments, and for most use cases, they are a really good fit because of those attributes. For general purpose use cases where you want to interact with these models with natural language, that's where transformers kind of help blend between different modalities and make things a little bit more interactive.
BP I just want to go back for a second to some of the names you mentioned– Geoff Hinton and Yann LeCun and the ImageNet competitions of 2012 and 2013. Do you ever feel like you were in this sort of Forrest Gump moment of meeting people who now are luminaries in the field, and that time being seen as kind of a watershed moment for a renewed interest in AI and an awakening of recognition of the capabilities that for a long time people had kind of felt were never coming?
MZ It was basically luck. In Toronto, running into –the student's name was Graham Taylor, who's now a professor up in Canada doing AI as well– but he happened to be my resident advisor on the floor I lived on in residence. So that was just pure chance that I ran into him, but everything from that point on was much more. Once you're kind of in Geoff Hinton's lab and you realize that at that time there was only four universities really doing AI research for neural networks. There was Toronto, there was NYU, Montreal and Stanford. So there was only four schools I could basically choose from for my PhD and that's what brought me to NYU. And it remained a tight-knit community until that 2012 AlexNet ImageNet which shocked everybody, because before that, computer vision was handled by a lot of handcrafted features and you would kind of template match over an image to recognize something, whereas Alex and Ilya and Geoff as part of AlexNet just trained a large neural network on GPUs and it blew everything away by like 10 percent more accurate results. So everybody finally believed neural nets are real, and Geoff and Yann have been saying that since the ‘70s and ‘80s. And now just a week ago, Geoff Hinton won a Nobel Prize for all that, so super fortunate to work with those guys.
BP That’s right. Look, I don't want to spend too much time talking about other folks, but it is worth noting that you studied with Geoff Hinton who just won a Nobel Prize and who also left Google because he felt he wanted the opportunity to speak out on the dangers of AI and didn't want to be constrained in any way by being part of a corporate infrastructure. So maybe as we get towards the end of the episode we'll touch a little on that, but let's dive into some of the details. When you founded Clarifai, what were you focused on and how has that evolved to today? How has the company changed over time?
MZ Off the start, it was based on the research I was doing at NYU which was focused on understanding images with neural nets, so very computer vision-focused. That's where ImageNet kind of fit in that bucket. And we really wanted to model the company after other developer-first companies and API-first companies like Twilio and Stripe. Those were kind of the inspirations to start. And we had our first API up in 2014. Very simply, you send an image and we tell you what's in it. We only had one model. Today people would call those foundation models. It was the biggest computer vision model for sure at that time and recognized 10,000 different things in the world. Then we realized as we started talking to customers that they had different needs beyond general purpose recognition. Some of them had wedding photos, some of them had travel, some of them wanted to filter out content that you don't want on the internet like nudity, weapons, drugs, that kind of stuff, so they wanted a moderation model. And so we started building a gallery of these pre-trained models ready to go for different use cases, and a couple of years after that, we realized that there was just no way we'd be able to keep up with every possible use case. And so in 2016, we launched the ability for users to train their own models on our platform, so they could upload their own data. At that time it was expected to upload labeled data, now we have the ability to label the data in our platform, and then you can one click train to customize a model to recognize what you care about. At that time, we also launched probably the first vector database. We never called it a vector DB, we called it a visual similarity search, but it was used to find similar images. They use a query image, find similar content, and now that's all the rage. Vector databases, since the ChatGPT moment, everybody is talking about them as being an essential building block for modern AI. And so you can see that it evolved from these foundation models to the tooling that you need in order to create high quality models like that. So the data labeling, the data management, the training, we have evaluation metrics, and of course, since 2014, we've been running production scale inference, up 24/7 with multiple nines of uptime. So that's why today we position Clarifai as the quickest way for you to get into production with AI. The other thing I'll just add is that around five years ago, we made the decision to move beyond just computer vision into other unstructured data types, so text and audio were added to the mix. So image, video, text, and audio is what we call the unstructured data types, and when you think about it, that's what your brain has been good at and computers traditionally have not been good at, and that's the type of AI we focused on rather than time series or rows and columns type of understanding.
RD You talked about people uploading their own data to train the models. How do you deal with the sort of thornier issues around that data? You mentioned moderation models, those have to be trained on awful images. And if you want facial recognition, you have to train it on people. How do you deal with the sort of sensitivity of that data?
MZ That's another benefit of just providing the tooling, because it's really on the customer to make sure they have the rights to the data, whatever appropriate licenses, et cetera, and they put in that hard work using our tools to label it in the appropriate way. And it is unfortunate that to recognize that unwanted content, you kind of have to recognize it with humans first to do that labeling before the AI can do it. So it's not an easy task, but we provide the tools to make that easy and efficient so that humans are doing basically 100 times less work, and the way we do that is we automate the labeling process with AI itself. So you kind of start with your unlabeled data, label a little bit, train a model, and in a few seconds, you've got a model recognizing what you care about. And it's not going to be great off the start, but you do that in an iterative fashion. You start applying that model to label, humans become reviewers of those labels instead of creating the labels, and over time, in that loop, you get an accurate model out and very little human intervention by the end of it. So that helps in those scenarios where you don't want to have humans sitting and looking at unwanted content all day.
BP So I want to ask a sort of self-centered question now that takes a little bit of what Stack Overflow is working on and puts it in the context of what you're doing. A lot of people are dreaming of an enterprise search that is very different from the way we knew it for the last 10 years. It's much smarter because it can vacuum in all the data from inside of your company, make sense of it, and then respond to a natural language question with a natural language answer. So you had mentioned that you now are able to ingest unstructured data from clients, help them label that, and therefore then they can then make use of it. So if a client was to come to you with a desire of enterprise search as a big knowledge base for their company, and they're giving you stuff from their Confluence and their JIRA and their GitHub and their wiki, you could certainly label that data and put it into a vector database so that questions about Python are associated with articles about Python, but is there any way for you to make a qualitative labeling to know which data is accurate or which data is used most frequently or which is most up to date or anything like that?
MZ Interesting question. In terms of some of the up to date stuff, that's a little harder and easier for computers to do, like when was it last updated kind of timestamps, but for some of the qualitative stuff, you could train models to classify that stuff. Things like pulling out PII, pulling out named entity recognition– so people, places, and things– those can be really helpful as a precursor to understanding where the facts are correct and the facts are incorrect. And those models work very reliably at this point, so I think that could be a building block in fact-checking the accuracy of some stuff, and it's a problem we know well. Internally, as every company does, even engineering docs, for example, get out of date internally pretty easily and it's always hard to stay in sync with the code. So I do think that, given all these copilots for code development, a future where that's in sync with your documentation is pretty likely. If it can generate the code, it might as well generate the docs and keep them up to date at all times.
RD So for the AI today, you talk about the tooling and a lot of infrastructure behind it. There's a lot of compute needed to do the inferencing. What are the sort of things that, for a company wanting to get into this or wanting to start up an AI program, what do they need for their infrastructure for the tooling and what do people usually get wrong?
MZ So we've seen that data is the biggest area that people get wrong and take the most time to get right and they kind of overestimate how good their data setup is today when we're in conversations with them. So when you go to a large enterprise, they’re like, “Oh yeah, we've got all the data. It's high quality. It's labeled. We know where all of it is.” And then you start a POC or a production contract, and they're like, “Oh, actually…” When we see it, there's not that much of it or they don't even know where it is internally. So that's usually the biggest thing. And obviously, data is the precursor to customizing any of the AI or even applying the AI over your datasets. So the big suggestion I always have is to get your data in order now because inevitably you're going to be applying AI to it. And data is also kind of your gold mine within an organization. Just like we talked about with enterprise search, the efficiency of your company kind of depends on the quality of your data internally.
BP I want everybody to know that Matthew is not setting me up and he was not paid to do this and that we're really here to talk about Clarifai, but this is where me and Ryan live. We're reading Textbooks Are All You Need and an AI can be 1/10 the size of an algorithm model, but if it's trained on Stack Overflow data with accepted answers, it's just as good as the big model, because to your point, the data is so well organized and it's structured in a way that makes sense for a Q&A bot. And so kind of what I was getting to earlier is, if inside of your company you're ingesting emails and Slacks and wikis and stuff, some of that data may be inaccurate, some of it may be contradictory, some of it may be out of date, and that's not as easy for you when you're doing all this automatic labeling to do as some of the other things that you can do with labeling. And so having the wisdom of the crowd and the human curation that is Stack Overflow, for example, ahead of time is kind of invaluable if you want, to your point, to tap into this incredible resource that is all the information inside of your company.
MZ Absolutely. And other areas that people should plan ahead for are just understanding the use case that you're trying to solve. I think people see on social media all the different things you can do with ChatGPT and these powerful large language models and large multimodal models. They are really fascinating, but at the end of the day, they cost a lot of money to run, so if you're not solving a real business use case, you're going to waste a lot of time and money. So always start with the use case and work backwards from that, and not every use case can be solved by AI. More and more can, but you should think critically if it is the best solution. You don't have to apply this giant hammer to every nail. And then once you do decide that AI is a good fit for your problem, thinking about that compute is really important. Where do you want to run that? Is it in the cloud, is it on premise? We see this growing trend of people purchasing GPUs, TPUs, AI-specific processors, and we see that that's going to cause a shift back from the cloud to on-prem. And we actually have a new product coming out. We're deep into the development and actually have a preview of it now so people can reach out and request access to help customers organize that compute that they are purchasing, whether it's the cloud or on-prem, in a consistent way so that you can make the most of it, run it efficiently, apply all your AI workloads in that compute. And so I think that's going to help address this growing problem of, “I purchased racks of GPUs and they're just sitting there.”
RD It's interesting to hear you put a little damper on it. Check your use case, don't get AI, because there's been a big hype cycle around AI. I know some of our listeners have written in to be like, “Stop talking about AI,” but this is sort of part of what happens in technology. There's the new tech and people get hyped up about it. Do you think the current iteration of AI is going to stick around– it's going to blow past the hype and be a lasting tool?
MZ Absolutely. I think the ChatGPT kind of moment, I always refer to that. And obviously there's a lot of companies involved beyond just OpenAI, but that was really the trigger that made everybody realize, “Wow, this is something different. This is something that we only thought humans were able to do, interact in natural language, think about things,” and now they're getting more fascinating with the ability to understand audio and visuals combined with that text. And that was kind of the spark that ignited a whole exponentially fast-growing research community. I think people were playing with language models for years, transformers are not brand new. It wasn't that that was the first transformer model, but putting it out there and having everybody just be able to play with it in an easy way kind of sparked this fire, basically, and I don't think that's going to go away. I think what people are now realizing is that it was very easy to get the prototypes, and actually running this 24/7 in production in a cost effective way, and not just your one toy use case but 100 use cases in a large enterprise, that becomes very difficult. And that's where platforms like Clarifai are meant to help you organize that and give you all the tools in one place.
BP So getting back to Ryan's earlier question, you were there right at this amazing moment in 2012/2013, and obviously I'm sure you saw a huge change in what customers were interested in post-November 2022. But in that in-between time, there were lots of companies that were applying AI at scale, whether that was for content recognition or high frequency trading or all kinds of things. So are you continuing to see a lot of customers who are using systems that are not what we think of? They're not Gen AI, they're not transformers, they're a different approach, they're a different architecture. And if so, what are they using them for? Is that also a growing industry? And maybe sometimes, like you said, do you guide people there? They come in and they say, “All right. Well, we’ve got to get Gen AI. Everybody's got Gen AI. Tell us what to do. We’ve got the data,” and you say, “Well, actually this much simpler version of AI or this more deterministic version of AI is going to be a better tool for you.”
MZ For sure. Gen AI is not the solution for a lot of use cases. It's maybe the most interesting thing, that's why it has spread, but it is actually not what you want in many scenarios. You want the deterministic, predictable AI for most scenarios. So things like moderating content, companies like OpenTable use us. All the restaurant images you've seen have gone through Clarifai to make sure they're wanted on the site and they don't have any offensive stuff. So it's a very practical use case. You don't actually want slow models, expensive models. You don't want nondeterministic models in that scenario.
BP Generating new images of the restaurant based on a couple of keywords.
MZ Exactly. And then you show up and you're like, “What happened?” But another good example is military applications. We do a lot of work with the public sector in the Department of Defense intelligence community and their decisions are very, very important. They could mean life or death, and so you want the AI to be as accurate and reliable as possible. And that's where non-Gen AI has a lot of use cases. And I see in the future, though, these things are just blending. What people talk about as Gen AI is kind of able to create content, but it's also just a different way of interacting with these models because you can actually talk to them in natural language, and I think that's going to influence the way we do more predictive AI as well and they're going to kind of fuse. But you shouldn't have to collect a dataset of a million examples in order to get a highly accurate model. In the near future, you should be able to ask it to recognize what you care about and then get an efficient, reliable, highly accurate model as an output. So I think that's going to be an exciting future direction.
RD So speaking of future directions, you mentioned foundation models and your own foundation model. We know about a bunch of foundation models now. Everybody is trying to train the next generation of them. Do you think we'll come to a point where there's not going to be any additional room to grow in those foundational models, that they'll sort of settle down and we'll have the gold standard emerge?
MZ It's interesting. Every day there's something new. So right now in the foreseeable future, that's not going to happen. I think it's getting harder and harder, though, for just anybody to create these models, given the amount of compute they need and the amount of data that they kind of feed off of, so I think smaller players are at a big disadvantage there. People like Meta have stepped up to help in a way by kind of lending their data and compute, so then it gives you a good baseline that you can then fine-tune. And that's really caught up to the closed source models and I think that's really helping the community stay up to speed. But it's also a race to the bottom it feels like. It doesn't seem like any one model is very differentiated at this point and every one that comes out is a little bit higher on these benchmarks. We urge our customers that a public benchmark is a good start, but you’ve really got to benchmark on your own data to know if this model is better than this model or if this model is overkill for your use case.
BP We've been doing some writing and some research recently on, hypothetically, what if we run out of data or all the big frontier models are kind of converging into this almost commodity space because they're all using the same datasets to train on. Do you internally or with clients ever work with synthetic data and do you have any thoughts on its utility and to what degree it's useful and to what degree it might also be something you need to be cautious about, reading about model collapse or degradation of performance or things like that?
MZ I think it's useful. We're seeing– I don't know if it's traditional or synthetic data generation, but this semi-supervised training I think is really, really important where you're not having to rely on carefully curated labeled data for the whole set, but kind of what I was talking about earlier, the auto data labeling. We're seeing some of these foundation models do that at massive scale, and even computer vision foundation models to segment out pixel by pixel, that kind of stuff that are trained not by having humans draw on millions of pictures, but the models get good, they start labeling the pixels and they feed off of that. And when you apply them to enough images, they just surpass any kind of human labeled dataset’s accuracy. It's kind of fascinating. If you feed in enough data, you get a better result than a small dataset that humans can keep up with.
BP And so what's the ‘semi’ part of that? Where does the supervision come in?
MZ So humans start the process and label a little bit, you get a model and you start applying that to bigger and bigger datasets. Eventually the last phase is that you're just using the last version of the model to label a time.
BP I see, I trained a model that says that every time I show you a picture, if it's a bird, put it in this pile, and then that model can take out all the birds and then later somebody can go and use that data knowing that it's been well-labeled, even if those all haven't been hand-labeled, but a model that can identify birds has labeled them and they're useful for that.
MZ Yeah, exactly. And across enough different categories and enough variety of data, it can create a model that far exceeds the smaller data scales. So that's really, really interesting to me.
RD We're getting to the point of the hype cycle where governments are getting involved and trying to put some guardrails around the data usage and stop bad use cases. Do you think that will have a beneficial effect on the future of AI?
MZ It's tough with regulation. I think it even personally makes me curious when you see these models that can generate video and audio tracks, where did they get that data? It just seems like to get the quality of generations that we're seeing from some of these most recent models, they must've trained on massive amounts of that data and I can't see how they could have gotten the rights to all that. So at some point it just feels not fair to some of the original creators. With text, I don't know, there's a lot of precedent on web crawls and that kind of stuff so it becomes more of a gray area, but for the visual content, there's been whole industries like stock media that own the rights to that content. And so that seems like more regulation is needed and more thought there on the ownership, but I also don't think we should slow down the innovation. I think that would be very detrimental because other countries that don't treat regulation as seriously will just surpass us. And as we work with military, that's a very top of mind subject for them. If you fall behind on the national security aspect, that's scary for everybody.
BP All right. We'll end here. We're going to go out on a limb here. First, I'll just say a quick note. I think what you said is interesting– how do you determine rights holders? We just published something with some folks from the Data Provenance Initiative who are looking into this, and also the fact that because of worries about are folks taking our data and is that going to harm our business, a very large percentage of what used to be publicly available web data is now being restricted or put under terms of service or taken down. So the ecosystem is changing of the Common Crawl. And for Stack Overflow, we recently published a couple of blog posts laying out some of our theses on this, and in a nutshell, if a knowledge community is generating a ton of really useful data, then AI that trains on it should license the data and the money should go back into keeping that knowledge community alive, similar to maybe the way you would look at an artist whose work is used to train and getting some recompense for it. But you brought us to national security, and one of your former professors, Geoff Hinton, has made a sort of crusade now to warn people about the dangers of AI. So I guess I'll ask you, as someone who's been involved in this since 2012, what are your thoughts on frontier stuff? What are your thoughts on AGI? What are your thoughts on to what degree people should be worried or not worried about this? Obviously we all know it can be an incredible tool for creativity, for improving productivity, for automation. What are the things that you think about when it comes to, well, you mentioned national security, when it comes to hearing what your old professor had to say, or just the things that you think are coming down the pipe based on discussions probably with your clients and your staff?
MZ It's a tricky subject. The technology today is impressive but it's nowhere near AGI. There was actually a paper out of Apple I think a couple of weeks ago talking about how these models aren't actually doing reasoning. They make you think that they're doing reasoning with these kind of chain of thought, multiple steps, all that. But they carefully crafted some datasets to evaluate whether they can actually reason and basically inserting data into the standard questions would confuse it completely and the accuracy would plummet. So they couldn't actually reason what is the important part of a sentence or prompt and what's the noise. So I think we're a long way away from these models taking over. I think what's concerning people like Geoff Hinton– my hunch, I haven't talked to him recently about this stuff– but one thing that personally concerns me is that we don't understand how these models work very well, whereas with convolutional neural networks for example, I actually had a paper– my most recent paper cited paper was around visualizing and understanding convolutional neural networks. And we could show that as these layers build up, it would actually learn different patterns within the image automatically, edges in the first layer, then you'd see T-junctions and circles in the second layer, and eventually you'd see an eyeball and then a face. And so those things are very well understood. With transformers and given the size of these models, I don't think anybody really understands how they're doing what they're doing and I think that gives people pause. This is a little scary. If we don't understand how they work, are we going to be able to understand their limits and can we stop them? That kind of stuff comes into question. So I think more research should happen in that area of really understanding these things and hopefully that'll paint a better picture for the future.
BP Great.
[music plays]
BP All right, everybody. It is that time of the show. We want to shout out someone who came on Stack Overflow, shared a little bit of knowledge, maybe shared a little bit of curiosity, and helped other folks around the community to grow and learn. So a Populist Badge was awarded to Jay Wick: “How to get an image preview before uploading in React.” If you've ever wondered, Jay has an answer. It's so good that it has more upvotes than the accepted answers. So congrats, Jay, on your badge. I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can always find me on X. They went and changed the URL, so I guess I can't call it Twitter anymore, @BenPopper. If you have questions or suggestions for the show, you want to chat with me and Ryan, email us, podcast@stackoverflow.com. And if you liked the conversation today, do me a favor, subscribe and come back again.
RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog. And if you want to reach out to me with show ideas, comments, questions, praise, as always, you can reach out to me at LinkedIn.
BP Matt, say your full name, what it is you do, where you want to be found online if that's a website or social, and then where people should go to learn more about Clarifai.
MZ Sure. Matt Zeiler, I'm the founder and CEO at Clarifai. You can find me on X @MattZeiler. And you can sign up and try Clarifai for free at Clarifai.com.
BP Very cool. All right, everybody. Thanks for listening and we will talk to you soon.
[outro music plays]