The Stack Overflow Podcast

This startup uses a team of AI agents to write and review their pull requests

Episode Summary

In this episode we chat with Saumil Patel, co-founder and CEO of Squire AI. The company uses an agentic workflow to automatically review your code, write your pull requests, and even review and provide opinions on other people’s PRs. Different AI systems with specific capabilities work together as a mixture of experts, following a chain of thought approach to provide recommendations on security, code quality, error handling, performance, scalability, and more.

Episode Notes

You can learn more about Squire AI here

Connect with Patel on his LinkedIn

Congrats to Bharath Pabba for earning a Great Question badge and helping 129,000 people with a similar question by asking:

How to disable source maps for React JS Application?

Episode Transcription

[intro music plays]

Ryan Donovan Monday Dev helps R&D teams manage every aspect of their software development lifecycle on a single platform– sprints, bugs, product roadmaps, you name it. It integrates with Jira, GitHub, GitLab, and Slack. Speed up your product delivery today, see it for yourself at monday.com/stackoverflow.

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast: AI is getting flirty edition. I am Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my colleague and compatriot, Ryan Donovan. Ryan, you and I were on a podcast just yesterday with somebody who's building a tool that helps evaluate engineering teams and also works in the realm of understanding code gen. And their hot take, which will be our headline, is in two to three years, there will be no junior engineers. That role will be gone. Not to say that people coming out of college won't work in software, but they won't work as junior engineers because that will be the AI's job. And obviously Devin and other AI agents have raised a lot of eyebrows and a lot of money and people are just very interested in what's going on. Simultaneously, I think there's something different happening in the labor market, which is that there was a big change in interest rates and suddenly it wasn't all about growth– it was about profit. There were a lot of big layoffs, and you and I have interviewed multiple people and I've met people in my own life who worked in tech for the last 10 years and for the first time in a decade cannot easily find a new job. And it's unfortunate. It's a harsh reality what's going on with the people who listen to this podcast and who we try to work with at Stack Overflow. So today we are lucky to have a guest who's going to be chatting about some of these things with us– Saumil Patel, who is the CEO and co-founder over at Squire AI about some of what they're building, including tools that can help folks give new meaning to their code base. So without further ado, Saumil, welcome to the program.

Saumil Patel Thank you for having me. It’s great to be here.

BP I heard you and Ryan chit-chatting about San Francisco before I got on, so give folks the quick flyover. How did you start in the world of software development, what was your education and early career like, and what led you to being a founder of your own startup?

SP Absolutely. I'm a huge gamer. So my early, early career was me making websites for my guild back in the day and that's how I got into programming. And I decided to go to school for it because I just loved it so much. I went to school for computer science in Ottawa at Algonquin College. Right out of college, I started my first startup and I've been in that startup scene ever since. I've been working on developer tools for the past five years.

BP So working on developer tools for the past five years, does that encompass the founding of this company or did you work for some other folks first and then decide to start your own thing?

SP That encompasses the founding of this company. So we actually started a few years ago and we were working on just in time documentation for developers directly in the IDE so that you can document the code as you're writing it and we would help you make sure that your documentation doesn't rot over time. As LLMs came onto the scene, it obviously kind of pushed that to the wayside because now you can just start to find that meaning by just passing your code to LLMs. So we've been through a couple of iterations.

BP Interesting. I guess that means in some ways we're competitors. I've heard people bandy around the just in time documentation as one of the things that Stack Overflow for Teams is meant to do. You ask a question, somebody gives you an answer, then that knowledge artifact is stored. And similar to you, we announced OverflowAI. You're no longer going to have to go find that in a human-centric way. You're going to ask the universal librarian powered by an LLM and they'll pull up a synthesis for you or point you in the right direction or explain what the code or the comment means. So talk to us a little bit about the evolution of the company, because I think that might be an interesting way to sort of start this. What was your first MVP and how has that evolved to what it is today with LLMs emerging as this powerful new technology that almost feels like it’s becoming a new layer of technology. I've heard people we talked on the podcast say that this is going to be like cloud, or this is going to be like containers, or this is going to be like X. It will be baked into the tech stack of every company in some way.

SP Absolutely. So we were doing just in time documentation and LLM kind of knocked us out of that. And we went through a couple of iterations, so we moved towards code ownership and we decided to kind of pivot into this idea of helping you understand who's responsible for what part of your code base, so that was another iteration that we did. And the most recent iteration is Squire AI, and with Squire AI, our objective is to create a suite of agents that developers can use to help them automate smaller tasks within the software development life cycle and that's kind of where we're headed right now. So you may have seen this idea of an agent that can just replace software developers. We don't necessarily agree with that. We're not there yet, and we probably won't be for several years. And on the other end, you have the autocomplete of Copilot that's directly within your IDE. We think that the future is somewhere in the middle where we use that LLM, that agentic idea of being able to take a task and bring it to completion, but having it be atomic and be very specific to either test specific pieces of code or review specific pieces of code or document pieces of code or maybe even help you write functions. And that's where we're headed right now with Squire AI, to build those suite of agents that you can leverage as you're writing code to help you along the way instead of replacing you.

RD Are you talking putting AI agents at build time?

SP Exactly. We are adding AI agents at build time, but we want to add them at every layer. So when you're doing research, when you're writing the code, when you're building in your CI/CD, agents can be permeated throughout the entire software development lifecycle to help you each step of the way. Today, we're starting with reviews, so when you create a pull request, our agent comes in, our agent traverses the code base to make sense of the changes that have been made to help you understand the changes that have been made, and also to help you guide and give you constructive feedback, not just from a, “Here is the diff and I'm going to pass it into an LLM and give it to you.” We go way beyond that. We search for things, we search for symbols, we search for meaning in the codebase, we search your documentation to give you constructive feedback based on that context awareness that we've built.

RD I think it's interesting that the shift from your code gen to agents and I think that code review part is pretty key to having an agent where it can sort of reflect and not just pump out code. It can be like, “Well, is this good code? Does this fit with the documentation? Does this align with this best practices document?” How do you see agents as operating, especially on this atomic level you're talking about?

SP So there's several different techniques that are emerging right now in terms of how companies are using agents, or how people are developing agents even. What we personally believe is that agents will be atomic and there'll be tiny agents that will work together with other agents to achieve bigger and bigger tasks over time as LLMs become more and more capable. So we are seeing these specific patterns that people are using with agents that include reflection, tool use, planning, and multi-agent collaboration, and together as you combine all of those pieces, agents are able to give each other feedback and they're able to utilize each other. So one of the things that we do is we have agents that go and do research and then we have an agent that is responsible for reviewing the entire diff and then we have agents that are responsible for reviewing parts of the diff and that allows us to have this fine-grained control over what each individual agent knows and doesn't know to avoid confusion that might happen if you're, let's say, looking at a diff that is a thousand lines long and you don't necessarily want to start going into different parts of the code base and getting confused about what you were actually doing.

BP One thing that came up yesterday, I joked about flirty AI because yesterday OpenAI showed off their latest iteration of ChatGPT, and it was much more conversational. And not only that, but it brought humor and almost an affectionate attitude towards the person that it was having a conversation with. It's interesting because I heard Sam Altman on the All In Podcast, and he was saying something similar to you. We think that there's going to be these high-level reasoning agents that are created by the largest AI companies, and then they will go out and pick from a series of models or tools, and that will empower them to do all these things that they weren't sub-trained on. There won't be one agent you have for your coding and one agent you have for your language and one agent you have for your biology. There'll be a master model that's really good at reasoning, and it will know, “Okay, I can go out and hit the API call for the product that you're building when I need to do X, Y and Z.” So in that sense, I feel like there is maybe a consensus forming around what's going to happen, although there's no sense in trying to predict where we're going to be in a year or two here. But let me ask you a question. One of the things that Sam Altman said he wanted from an AI was something that was like a great senior employee willing to challenge him when it felt like it had a better suggestion or it was asked for an idea and it said, “I'll think about that, but just so you know, I'm not sure that's the best idea.” And they kind of showcased that yesterday with a developer getting ready for an interview. “How do I look?” “Well, maybe you'll pull off the sleepy coder thing, but not great.” “Okay, what if I put on this hat?” “I wouldn't go with the hat. You look better without it.” That's, to me, a really interesting new wrinkle, which is that the AI has opinions. And so obviously when it comes to, “Hey, will you write a function for me or leave comments on this code?” the AI might then bring an opinion. What do you think about that?

SP Absolutely. I think that is the direction we're headed in. I think it's important to mention this paper. Hugging GPT is a paper that I recently actually read a few weeks ago, I believe, and it actually goes into how you can give this agent a task and it's able to go on Hugging Face and find different models and leverage them to achieve that task. So Andrew Ng, he actually demonstrated this in one of the videos where he actually had a picture of a child on a scooter and he said, “I want to see a girl reading a picture in the same pose.” So basically the model went and found other models that can help them find what the pose looks like and then generate an image of a girl reading a book. So it went through several different steps and it selected the right models to achieve that task. And so we're definitely headed in this direction. And with tools like Refraction, for example, our agent that reviews code, the objective is to give criticism. The objective is to look for things that are missing or inaccurate or not done the right way, whatever the right way might be, and then you can use specific processes like tree of thought or chain of thought to really try and figure out if that is the way to go. Tree of thought could be a really good example of planning to use in that scenario where you can say, “Here is five different pieces of criticism we can provide. How do they lead to a better outcome in the end?” And you can use the LLM and reason to try and find that best possible path. And maybe it is no criticism, but maybe it's aggressive criticism, maybe you're just going, “Here's a suggestion.” So we're definitely headed in that direction where LLMs should be able to have this kind of divergence of thought and then come back to give you something that makes the most sense, whatever the most sense is.

RD It’s interesting you mentioned getting different models. With agentic workflows, them generating the code, criticizing the code, do you need different models? Is there a risk of it all becoming the putting the award on Obama meme?

SP So that's something that we're actually currently investigating. We want to use models like Code Llama to have agents write code because those models don't need to be trained on as much data as, per se, GPT-4, where you're using this really, really large and heavy model to do every task, but you could likely achieve similar, if not better, outcomes when that model is constrained into learning about a specific thing and it has less of a confusion to it. And so what we're currently trying to figure out and go towards is building these very specific models, and the key is to figure out where you need a separate model and where you don't. A great example is writing code. You could be writing code in just Python, and we know which language we're writing in when we're writing code so we could have specific models for each of these languages that are proficient at writing code in that particular language so that you're following the standard and the model is not getting confused. When I'm using ChatGPT or some other models, I've had instances where the model starts to write code in a different syntax somewhere in the middle, and it causes syntactical inaccuracies for lack of a better word. And so we can get rid of those kinds of issues.

BP I think that's a really interesting idea. There's an idea of overfitting or overtraining. I remember somebody saying, “Yes, we could get some of these large language models to be great at speaking not just English, but French and German and Italian. But when you get them good at the other languages, you see a slight decrease in their English language proficiency.” I don't know if it will always be that way, but let's say we want to have it be able to answer questions about code in any language, but that creates a decrease. So then you say there's a master model and then there is the Hugging Face number one for each particular language that it can make an API call to when it needs to make a request in that language. I'd be curious to know what's the tech stack like at your company? As you're trying to build these things and as you pivoted from not working with LLMs to working with LLMs, what kind of languages and frameworks are your engineers using and what new technologies or techniques have you had to adopt, like vector databases or embedding or RAG or anything along those lines?

SP We're using a very large set of different technologies. In terms of languages, we’re a mostly Python and TypeScript shop. We are using graph databases, we're using embeddings, we're using multiple different models, we're using agentic workflows to name some of them.

RD So for the agentic workflows, where does that magic actually happen? Is there some sort of massive system prompt on it? Are there specialized data sources? What's the thing that makes an LLM into an agent?

SP So an LLM into an agent, you can think of it largely as an LLM in a for loop. Obviously it's more than that. It's the ability for that for loop to think and act and then reconsider and think and act. So you're really kind of putting a large language model in this for loop to achieve a certain outcome, and you can control the outcome it achieves by providing it specific tools. So one of the things we've done is we've designed our system to allow agents to use other agents as tools as well. So tools you can largely think of using API endpoints or making a function call, but you can also just call another agent in that same manner to go and do this research or potentially write a piece of code, and that agent is responsible for making sure the syntactical accuracy exists, so it may end up parsing that code, making sure that it's syntactically accurate. It may invoke other agents to come and help review the code before it goes back to that parent agent. So we've built this hierarchy of agents that are able to take a specific task and hand it off to each other to achieve specific outcomes, and all of that rolls back to that parent agent that is then responsible for providing that outcome back to you. So that's the agent you're mostly interacting with, but along the way that agent is interfacing with many different tools, many different other agents, it's traversing the graph database to understand how your code base is structured because we parsed it and we understand how that data is supposed to flow.

BP So the other thing Sam Altman said in the interview this week was, “Well, one of the reasons that we haven't been able to make the ChatGPT Turbo available to everybody is the cost.” The cost of the inference is quite high. Now, they did announce something new yesterday which they said is going to be free to everybody, although it's not out in public yet, but, okay, you said one thing and now you're kind of doing another. Have you come up with some new technique internally that's going to give you some great new efficiencies on the algorithmic side or the cost side or whatever it may be? Talk to us a little bit about the business model that you're trying to build as you've moved into the world of LLMs. We talked about your tech stack. How do you envision yourself charging users and how do you balance that against the costs of running models or training models or hosting models or doing inference?

SP so currently we are per-seat pricing and that's kind of more predictive. What we've heard from the market is that usage based pricing just is unpredictable and it becomes very hard to understand the value that you're getting for the amount of money you're paying. So we are focused on per-seat pricing at the moment, and to control that cost on the other end, that is precisely why we're focused on running more specific models that are not as heavy, so that we can get that same level of quality, if not better, and have that reasoning be done by a foundational model to make sure that we're considering everything, but then when those tasks get down to reviewing specific code, specific type of code, writing specific type of code, we can start to use models that are significantly smaller, but just as good. And that's where we're starting to kind of balance that cost versus how much value we can provide and how much value we can extract from businesses.

BP Neat.

RD So I want to jump off of that to say that, with all these AI agents going around, there's going to be increasing demand on data centers, increasing energy demands, increasing materials innovation. Do you think that there will require some sort of additional non-computing innovations for the AI revolution to happen?

SP Absolutely. I think as we start to put these LLMs in a for loop, it's definitely going to increase the rate of inference. For example, most recently I had to push a patch because one of our agents just kept going as it kind of took a wrong turn at some point. So we're still going through the motions of really tightening things down to make sure that agents aren't just going on their own and going for hours and hours. And in the future, we may want that, but obviously, like you said, there's an energy conversation there, there's a cost conversation there, there's a compute conversation there, and all of those resources are constrained. And so definitely we will need innovations in all of these different places to be able to support this future agentic workforce that we'll be utilizing to help us achieve our day-to-day tasks. Actually, this is really funny– I'm a huge gamer and I have a PC sitting on my desk at home, and especially when I lived in Canada, it was at my feet and I had a GPU in there and it would just go and it's pumping so much heat. So I was actually having a conversation with my co-founders about why don't more people just have GPUs as heaters in their home and you can sell this compute, so I personally foresee this industry existing where you can sell heat.

BP I had a plan going, back when I lived on a farm. I rent the GPU out and it heats the chicken coop in the winter and I capture it on both sides.

SP Ben, we’ve got to work on this together because that is exactly what I'm talking about. So we can definitely start to do these multiple steps to try and extract more value because we're generating heat either way and it's just as efficient.

BP No, I agree. I think one sort of funny thing here is that if you listen to some of the big conversations with Zuckerberg and Altman, they're saying the bottleneck here is going to be building new data centers, but more than that, getting them energy. I just read a big story yesterday in The New York Times about how antiquated the United States grid is and that all this money that was put out there from the Inflation Reduction Act to add wind and solar to the grid is irrelevant because the grid can't handle it and it can't transfer it where it needs to go. So I think energy is actually the bottleneck in a lot of ways, which is ironic and also one of the bigger things that I think is slightly dystopian. We're building AI and it's good and I hope it improves people's productivity and makes people's lives better, but also we're just screaming our way towards climate change and building as many big new data centers as we can to suck up as much energy as we can. It's just sort of like, “Oh boy, what are we doing here?”

SP Yeah, absolutely. I think that one of the key things is going to be really kind of leveraging that idea that people should own their AI. I'm a huge fan of open source AI. I love what Meta is doing. So in the future, I do envision people owning their own AI and having that computer in their home that is serving a dual purpose, so there's definitely a future there.

BP You mentioned actually, it was interesting that you were like, “In our cost and thinking about cost, we might select models that optimize for that. So that means we might take a Code Llama 7B, because that's good enough and has low latency and use that as opposed to the Code Llama, whatever they’re billing, 420B,” because that's overkill, right?

SP Absolutely. So depending on the task, you can be selective about what kind of models you're picking up to get the job done, and you might not need a 400 billion model when you can get things done with a 7 billion model, especially if you're starting out or working on a really small project and you don't need to do these very, very complicated things. So especially if you're a student and you're trying to kind of just get going, you can run a 7 billion model on your laptop and use it without actually having to fork over your entire wallet to just get access to a model that can do maybe 50% better.

RD And you mentioned earlier about the specialized models. I’m sure you read the ‘Textbooks Are All You Need’ paper about the tiny stories model and the Phi using very, very targeted data to create a very targeted LLM. Do you think that's the future or do you think there will also be a place for the big general models?

SP I think the big general models, in my opinion, will likely act like managers because they do have that great big amount of knowledge that allows them to reason.

BP Right, the broad reasoning.

SP The broad reasoning, the abstract thinking to be able to draw from so many different data sources, but then the further down the chain you go and the more agents have kind of called each other, you end up at an agent that is just so specialized. So I do think that there is space to have these generalized models, especially the models that you as a human are interacting with, unless you're obviously talking about a coding model that is doing some autocomplete. Most of the time, if you're conversing with a model, I imagine that those will be foundational models. But then when that task starts to happen, when the execution starts to happen, I think we start to get down to these models. So I think largely the models that are going to be in a loop trying to achieve a task will be highly specialized, or at least should be because it is way more energy and cost efficient to do it that way.

BP I heard about a model the other week whose job is to come up with new ideas for CRISPR proteins that you can use to do genetic editing, and then they made that open source so that people don't have to deal with some of the costly licensings that come with some of the ones that were already made. And they made what they called an LLM but it was specific to genes, specific to proteins, and so your average LLM is not going to be able to suggest the right sequence of DNA, but it might be able to call on that model if that's the discussion you're having or that's the work you're doing, right?

SP Absolutely.

[music plays]

BP All right, everybody. It is that time of the show. We want to shout out somebody who came on Stack Overflow and shared a little curiosity or knowledge and helped everybody else learn. Awarded five hours ago, a Great Question badge to Bharath Pabba, “How to disable source maps for React JS applications.” Bharath, thanks for the question. You've helped 128,000 other people who had this same question, and there's a great answer with 40 upvotes for anybody who has that question and is listening. As always, I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can find me on X @BenPopper. If you have questions or suggestions for the program, if you want to come on as a guest or listen to us talk about a specific topic or tell us to shut up and not talk about a specific topic, you shouldn't email us politely, podcast@stackoverflow.com. And if you like the show, the nicest thing you can do besides sending us money and free swag would be to leave a rating and a review, because you're not allowed to send me money. So leave us a rating and a review.

RD I'm Ryan Donovan. I edit the blog here at Stack Overflow. You can find it at stackoverflow.blog, and if you want to contact me on X, you can find me @RThorDonovan.

SP I'm Saumil. You can find me on X @SaumilP_. Our product is Squire AI. You can find us at squire.ai. We can help you review your code within 30 seconds and make sure that you look like a rock star when you push it up to the cloud.

BP Sweet. All right, everybody. Thank you so much for listening, and we will talk to you soon.

[outro music plays]