This is part two of our conversation with Replit CEO and founder Amjad Masad.
If you missed the first part of this conversation, listen to it here.
Replit is a browser-based IDE (integrated development environment). Check out their blog or start coding.
ICYMI: Stack Overflow recently implemented semantic search, allowing users to search using natural language.
Explore Stack Overflow Labs to learn more about OverflowAI and other projects.
Amjad Masad is on LinkedIn, Twitter, and GitHub.
Congratulations to Stack Overflow user macxpat, whose answer to How to install Linux packages in Replit earned a Lifeboat badge.
[intro music plays]
Ben Popper Hello, everybody. This is Part 2 of our conversation with Amjad Masad, the CEO of Replit. We had a fascinating chat which aired last week, and this is the second half, talking all about how and why they built a multiplayer IDE and how they hope to augment developer intelligence instead of automating developers away. Hope you enjoy.
Ryan Donovan So I wanted to talk about open source a little bit. One of the first things I thought of when I saw the multiplayer approach was everybody contributing to open source. How much of the multiplayer stuff is informed by your open source background?
Amjad Masad There's something about open source that is very unique and in a lot of ways very different than the corporate structure for software. In the corporate structure, there's some kind of org chart and hierarchy and rank that informs how people produce software. In open source, it tends to be more based on the work itself and less on the rank or experience. And there's various trade-offs to the approaches but I find it to be more interesting, especially for startups, especially for open source contributors, especially for solo developers that want to collaborate with other people, that those other people are peers as opposed to managers or reports. And so multiplayer is kind of informed by this idea that in Replit there's no sense of different privileges or different interactions, everyone is actually on the same level contributing in the same way. And that unlocked a lot of interesting use cases we hadn't thought about such as education. We see teachers hopping in to help students, especially during COVID, with coding and trying to learn and they would just hop in and see what they're doing and comment on it and on all of that. Interviewing– people use it for a lot of coding interviews. And also people building prototypes together; it's really fun to jump in and build a prototype. However, I would say the challenge is that you can't really build big software with real-time interaction. You can't really build large scale software, and that's because whatever I do can affect what you're doing. I can do a syntax error and you want to run the program and now it doesn't compile. So I think next year we're going to approach a time where we're trying to synthesize the two approaches. So you have the totally asynchronous, which is Git, and you have the totally synchronous, which is what we have in Replit today. I think there's somewhere in between where you can jump between the asynchronous and synchronous in a way that gives you the benefits of both worlds.
RD Like adding approvals to code added or something?
AM It's more like that I view Git as an underlying protocol. I think Git is a lot of times hard to use and all of that, but it's good as a low level system. So we want to build on that such that let's say you have a repo and that repo is live, you can see people working in different directions, but everyone actually has their own fork that's a transparent branch, and you can hop into someone's branch intentionally to comment on it, to work on it, but also you can branch off easily and then maybe there's some kind of primitive of merging that is easier than a pull request.
BP So you mentioned before that if somebody asked you about big models and small models, you'd get into it. I'm that guy. The fascinating trend I've been observing in the research, which I think maybe you're alluding to, is that you have these foundational models, and then if you take a more hyper-specific dataset, you can have sort of a parent/child relationship where something with a lot less parameters can get to near parity with a GPT-3.5 or 4. So how do you look at big and small models and how does that allow you to create, as you said, interesting things in house without having to go all the way to hiring a huge ML team and paying out the nose for a billion GPUs? How do you make use of the foundational models and then fine-tune or train your own that can do interesting things that are more aligned with what you want to do?
AM So I always like to take the Apple or Steve Jobs approach of starting from the customer and thinking about the user journey. And when you're coding and we're trying to help you with AI, there are essentially two forms of interaction. One is push, let's call it, and the other is pull. One is we're pushing suggestions on you, and that's the Copilot interaction, and the other is pull, where you're pausing and you are asking the AI for something, or going to Stack Overflow, or going to Google, or what have you. In the push interaction, you can actually have a fairly high error rate and still be useful. The acceptance rates at Replit for suggestions is about 30%. We know that Copilot is around that range as well. So you have a high tolerance for error because you can just keep typing and the suggestion will go away. Autocomplete is always like that. And what that allows you to do is have a small model that's cheaper because you need to trigger it really on every few keystrokes, and fast. It needs to be on the order of 100 milliseconds, at least the median time to respond, otherwise I think around 200 milliseconds the whole interaction breaks and doesn't work. And so that really does commit you to a small model, because if you want something that's cheap that's getting thousands, perhaps millions of queries per second, you need something that on the inference is cheap and you need something that's fast. What we found is that the sweet spot for power and speed is the 3 billion parameter model. I think now with the new chips you can go up to 7b, perhaps 12b, because especially the H100s are optimized for it and they can fit larger models because of the VRAM. But at least in the previous generation of chips, what we found is that a 3 billion parameter model for some reason is the sweet spot. If you go down to 2 or 1, it's almost that the model gets too dumb. If you go down to 7 or 8, whatever parameter, it gets too slow. And so it just took trial and error to figure out that. We tried a lot of different open source models. Salesforce put out a CodeGen model, 2.5 billion parameter, which is what we started with which is a really great model. But at the time, the CodeGen model was actually not using the latest state of the art and AI. So again, starting from the customer, we decided to train our own model and we decided to go from scratch, because if we fine-tune a model, there's always potential issues with that. Fine-tuning works when you're taking a distribution that is existent within the base training distribution so that you can fine-tune it, you can make it better at that specific task. But if you actually take data that's totally out of distribution, out of the pre-trained distribution, you might score well on benchmarks, and a lot of these models are scoring well on benchmarks. Something that scores well on benchmarks doesn't always mean it's good for production. And so we decided to really train from scratch, and we use what we call the time to LLaMa approach, which has now become industry standard. What LLaMa did is they trained way longer on much more tokens than the standard approach, that way you don't need a 300 billion parameter model. You can take the 3 billion parameter, we train it on half a trillion tokens. You have the next version coming out on a trillion tokens. So it was state of the art, it performed better than models 5 and 10x its size, and so that's on the smaller side. Now, when do you need a bigger model? When the user query is so undefined, when the user query is so general and there's no constraints to what people might ask.
BP Yeah. Like you said, you're going to have these two modalities. One is, “I need you to be at this many milliseconds,” it's almost like being in a gaming environment, “otherwise, by the time your suggestion arrives, I'll have typed past you,” and that's cool that you were able to build that yourselves. And the other one, which I think is great is, “I finished coding for the day. When I come back tomorrow, show me some ideas for this next feature,” and you arrive in the morning and your AI assistant, your automated developer intelligence, has given you a few starting points that you can take from there.
AM Yep, that's right. I'd be actually curious to hear what Stack Overflow is doing with AI. I saw some announcements, but I don't know if you can talk about it.
BP Uh-oh, are you turning the tables here? I think the most interesting questions for us are as follows: to what degree were all these really big AI systems trained on our data? Is there a way that these systems can then put resources back into the knowledge community that they're training on? That data is not commercially licensed. Can there be attribution and recognition and reputation for the humans who helped build it? And what about the tragedy of the commons? If people stop asking questions and providing answers on Stack Overflow and get everything from their AI, then who trains the next generation of AI? Or is each one its own little black box because there's nothing left in the public commons? So we're hoping with OverflowAI to do a couple of things. Some of it is just baseline, like let's add semantic search to lexical search. LLMs are great at that, let's make sure you can have a chat with our search and you get a synthesis of some answers plus some links to some answers. Some of it is IDE-focused, like what if your IDE could always offer you a suggestion, like you said, based on a Stack Overflow answer and then links to a couple of them if you want to dive deeper. And some of it I think is just the fact that the chat model is really great for enterprise search, so a lot of companies, big ones like Microsoft and Bloomberg, have put years and years of putting their documentation for all their codebase into Stack Overflow for Teams, so what if you could just talk to that like a coworker and it can give you the answer you need when you need it. Great value add. So I think it's both a challenge and an opportunity for Stack Overflow. We've talked about how it impacted our traffic and the way people ask and answer questions. It definitely introduces a lot of cool possibilities for us to add new features that are really interesting or new products that are really interesting. And I think, like you mentioned, and coming from the open source world as you did, it asked some big fundamental questions about to what degree should the exchange of knowledge continue to be on an open free platform like Stack Overflow, which was founded to be that you don't have to pay to get knowledge from Experts Exchange, versus that you need to pay 20 bucks a month to use these features that is making life easier for you as a developer. So that's my manifesto.
AM Oh God, you bring up memories about Experts Exchange.
BP I'm sorry. It's triggering.
AM So I worked in IT throughout college and I remember setting up these Windows networking and domains and you had to do all these arcane things with Windows for it to work. And you might be Googling something and you finally think you found the link for the answer. You click on that and there's the Experts Exchange paywall. It's like, “Ahh,” and I didn't even have a credit card to pay for it, so even if I wanted to pay for it I wouldn't know how to do that. And Stack Overflow was a game changer and really helped me both as someone who was asking questions and reading questions but also answering. I think it was very educational to figure out how to answer questions effectively. And it is somewhat worrying about the tragedy of the commons situation that you're talking about and do we want the Internet to be actually locked down? Do we want a world where, because of the game theoretic landscape, that we end up with a stratified Internet because we couldn't figure out how to create a positive-sum game and ended up being a zero-sum game?
BP Yeah. I think it's super interesting and I think Replit obviously has a role to play, both the way you talk about AI, the way you design your tools, and the fact that, like you said, being mobile first is bringing it to the next generation. It's the people who are in the countries that are growing fastest and still expanding and want access to this world as opposed to people who are sitting down at an expensive desktop. So I feel like more power to you. And does it have to be an either/or– can we have both? Maybe as we explore these neural nets more deeply and we come to understand them they'll be able to do great attribution. Or maybe when something novel emerges, they'll add it back. That's our hope for a Stack Overflow AI– that if a question emerges that the AI can't answer, great, now's our chance to add something new to the knowledge base. And maybe it's okay not to ask those same questions that have been asked a million times and marked as duplicate and closed. Maybe this helps everybody. The mods have less work to do but still new stuff gets added to that Library of Alexandria that we all want out there creating new stuff. And obviously it takes a long time at the moment, at least, to train a foundational model. So you can't ask questions of something where training data stopped in September 2021 and get what you need necessarily, so we still need people for that. But Ryan and Eira and I were all on a call yesterday with someone who works at a lab in UC Berkeley and they helped a woman who is paralyzed speak again and emote again. Her brainwaves have been decoded by a big neural net that can now understand what she's thinking and say them out loud. So I feel like we're living in a completely sci-fi time where just amazing jaw-dropping stuff is coming out of what AI can do for people. And I think we're navigating, like you said, how we make sure that this aligns with our values as humans, as opposed to maybe pushing us to the side.
AM Yeah. So when we were making our model, we were thinking about open sourcing and there's some business cases not to open source, such as actually having some kind of advantage and for a lot of our competitors not to copy it and all that. What we ended up doing is that the model that was trained on open source data, that's the one we're going to open source, and it was a very strong model for its size at the time. But we continued training on the Replit data and that we kept for the Replit users and the Replit business. And I think that's a good way of thinking about it– that if there's a foundation model trained on public data, maybe it needs to be in the public domain. But then if you have some data advantage as a business that you've been building for a while, you can apply that to your business. But there's a way to get ahead of these things and actually provide clarity for regulators and start a conversation, whereas if it's just a race to who can exploit more before the walls lock down, that's really a zero-sum game that they're playing.
BP All right, everybody. It is that time of the show. Let’s give a shoutout to Justin Joseph. They are asking a question about Vue.js. “Cannot read property ‘v4’ of undefined. I’m getting this error in Vue.js.” Well that was the question and Justin Joseph came in with a great answer, saved the question from the dustbin of history, and spread a little knowledge around the community, solved the error. Thanks, Justin Joseph. All right, everybody. Thanks so much for listening. As I mentioned at the top of the show, this is the end of Part 2. If you missed Part 1 and you want to check it out, want to hear more from Replit, go find it in the show notes. As always, I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can find me on X if you want to shoot me a DM. You can hit us up, firstname.lastname@example.org, with questions or suggestions for the show. And if you enjoyed the program, leave us a rating and a review, because it really helps.
RD I'm Ryan Donovan. I edit the blog here at Stack Overflow, that's at stackoverflow.blog. And if you want to get me in my DMs on X, the DMX, I'm @RThorDonovan.
Eira May My name is Eira. I'm also on the Editorial Team at Stack Overflow, write for the blog, write the podcast show notes, and I am on social media most places @EiraMaybe.
AM My name is Amjad Masad. I'm CEO and cofounder of Replit. Go to replit.com to check it out. It's very easy to sign up and start coding. I'm also in most places @AMasad, like GitHub. It’s so strange to say X, but at least there's a lot of opportunities for puns here, but on X/Twitter.
BP I hope you enjoyed listening, and we'll talk to you soon.
[outro music plays]