The Stack Overflow Podcast

USB-C for all, PHP 4EVA, and what do LLMs actually know (if anything)?

Episode Summary

Ben and Ryan settle in for a wide-ranging discussion about whether large language models know anything, whether language ability is unique to humans, and what the end of the Hollywood writers’ strike says about the future of AI-generated content.

Episode Notes

Ben is watching AI Explained, a YouTube channel that covers the latest AI developments and their implications.

Read Ryan’s article Do large language models know what they are talking about?.

Is language really unique to humans? New research suggests maybe not.

Not for the first time, Ryan recommends the work of Noam Chomsky: Why Only Us: Language and Evolution, an evolutionary account of language acquisition in humans written with Robert C. Berwick.

OverflowAI search is now available for alpha testing. Learn more here.

Good news for your cable clutter: Apple is switching to USB-C charging ports. Here’s when.

The WGA (Writers Guild of America) strike ended with an agreement that “allows for artificial intelligence as a tool, not a replacement,” but the arguments about creativity, copyright, and AI are far from over.

If you’re interested in working with PHP, head over to the PHP Collective and check out conversations like Most useful new PHP features for version 8?.

Episode Transcription

[intro music plays]

Ben Popper Big topics in data architecture call for big conversations. Big Ideas in App Architecture, the new podcast from Cockroach Labs, invites innovators to discuss their experiences building reliable, scalable, maintainable systems. Visit cockroachlabs.com/stackoverflow to listen and subscribe. Make sure to use that link and let them know the podcast sent you.

BP Hello, everybody. Welcome back to the Stack Overflow Podcast, Ryan and Ben Edition, bringing you a little hometeam goodness, talking over the news, everything that matters in the world of software and technology. What's happening, Ryan?

Ryan Donovan Oh, not much. Sitting around.

BP Not much, sitting around. Just getting ready for a day of farming on the old content.

RD There you go. Planting seeds. Incepting.

BP Tilling on the old content farm, exactly. So speaking of inception, brains, minds, and things of that nature, you and I like to stand on either side of the debate: do large language models actually know anything? And I watch a great YouTube channel called “AI Explained” that goes through a lot of research papers, basically just breaks down a lot of the latest research. And there was an interesting one– ChatGPT fails basic logic. You can say, “A equals B,” and then ask it, “Does B equal A?” and it will not get it. “Who is Tom Cruise's mother?” It will tell you. “Who is the son of this woman?” It doesn't have a clue. And there are other weird things like that. You can say, “What do you know about this remote island in Norway?” And it'll say, “I've never heard of it.” And then you can say, “I'm trying to think of this island that's at longitude x and is this wide and has this many people,” and it's like, “Oh, it's this island.” And it's like, “Oh, so you do know this island.” So in that sense, to your point, it doesn't know things the way we know things.

RD Well to flip on the other side of things, I have seen things where it sort of implies that language is sort of the basis of thought, that without language, you can't actually think things. There was an NPR story about somebody who was deaf and they never learned sign language, and they got into an ESL class when they were in their 20’s, and somebody was trying to teach them basically the difference between you and I, and then asked them, “What was it like before you knew sign language?” And they said it was sort of dark and fuzzy and they couldn't describe it, they couldn't talk about it. So with the large language models, it may just be that language is enough.

BP I do think that language is critical for a lot of conceptual thinking, like you and I. Obviously animals can be out in the world alone and learn a lot through experience, but to what degree of complexity does their sort of internal thought get without language? It's hard to know. There was actually a really interesting article in The New York Times recently about language saying that we've looked for a really long time to try to find something in our brain or even in our throat that's special and why humans have great language and nobody else does. And no matter how hard they look and everywhere they look, the answer is that there's nothing different. The circuits in a songbird's brain that make its little language of 10 chirps are exactly the same as our circuits. Basically we just won the biological lottery. We happened to get to a certain tipping point of complexity and then it kind of snowballed from there. And if you don't teach people that, if they're not raised in an environment, like you said, maybe they really don't go very far at all. Without senses to input data and without somebody to teach you language, what can you know? Thought doesn't just sort of spontaneously emerge in your mind.

RD So there's a book I read a while back called Why Only Us by Noam Chomsky and somebody else, talking about language and the sort of genetic basis and going through the brain and all that. And one of the things that is in there is that we have a special region that lights up for language called Broca's region. And then there's also, in terms of the songbird, a specific nerve. I'm sure that the neurologists listening to this will correct me, but there's a nerve that connects to Broca's region that doesn't quite go as far in songbirds.

BP It's interesting, I'll send you the article afterwards and we'll put it in the show notes– Chomsky actually signed on to one of these later papers saying, “Okay, I see where the research is going.” And he made a claim that was 30-40 years ago and maybe there's some merit to both sides. So it seems like he's kind of come around. The part of the story was that they did not expect him to say yes. They were kind of shocked that he would accept that criticism. But he's accepted some of that criticism and continues to look to see what's unique and what's not. But the flip side of this paper, the counterfactual was that they never taught these GPT systems how to play chess. That wasn't part of their instructions, they just showed them language and had them learn language. And yet ChatGPT, I think it's even 3.5, not the most advanced version, got an 1800 Elo rating and is very capable of playing chess if you use a certain notation that it can understand. And there's more possible chess moves than atoms in the universe. It can't have memorized the game because that's just not how it works. And so it has built, through its acquisition of language, some ability to have metacognitive abilities to think through a chess game. But then they also pointed out, you could ask it a question that was very similar, if you set it up so it wasn't chess, if you were like, “I have these four pieces on the table and I want to remove this one piece, and this piece moves like an L and this piece only moves diagonally, how should I go about it?” it doesn't have a clue. It has to be within the confines of “let's play chess” and then it knows how to do that.

RD I wonder, it's been trained on all this text and material and it's been trained on Stack Overflow and its sites and we have a chess site, so I wonder if just training on people talking about chess is enough for it to know how to play chess well.

BP Well, that's the question. If you say it's just a stochastic parrot, what it should be able to do is respond to a move that it's read before, but to have a sort of higher level strategy presumably to get to that sort of a rating, it makes you wonder if somehow from the language it's absorbed some of the strategy and can reason about it.

RD And there's a site off of there that is something like “checkmate in four moves,” and it's a database of all of the possible combinations.

BP I like that. Turing complete.

RD Chess is fixed. We're done.

BP We're done. One more thing just before we go there– Go, which was around for 10,000 years and humans play whatever, once AlphaGo started playing, it came up with completely new strategies that nobody had ever seen and now the game of Go is different and humans play it differently. So that was kind of like, “But it's not solved.” We thought we kind of understood how the game was played and you could have exceptional strategy within those confines, but the way the AI approached it with this alien intelligence opened up a whole new way of thinking about Go which is kind of cool.

RD Yeah. There's a Google spreadsheet I saw that's all the weird stuff that AI has done and when AI gets rules. There's one where it's a physics simulation and it's like, “Build the fastest possible thing in this,” and it built a very large tower that fell over. And that was the fastest, according to the rules.

BP I love that. “Make me the richest person in the world. Oh, I didn't mean for you to eliminate everybody else.”

RD That’s right.

BP So speaking of AI, I just wanted to shout out a couple things on Stack Overflow. We’ll include the links in the show notes obviously. We have something up on the blog and on our Labs page. You can apply to get into the alpha of Community Search, which is basically an evolution of Stack Overflow Search. You will get instant summarized solutions aggregated by Gen AI and links to the questions that it drew from as its ground truth so that we can make sure we reward and respect and recognize the community members who did that. You can ask follow up questions in a chat-like format. So it's not open to a ton of people, but you can get on the waitlist, you can read about what's happening, and definitely something we're playing with internally that's a lot of fun. Have you played around with it at all, Ryan?

RD I haven't, but I think it's a pretty exciting development. You no longer have to do the magic keywords to get what you want.

BP Yes, no more magic keywords, exactly. And one other thing I wanted to shout out on Stack Overflow, we're going to have a blog about this soon. We've had these things called Collectives for a while which are sort of areas where you can go to discuss a specific topic, whether you're really focused on AWS, there's AWS experts there, and now we have one about CI/CD, for example. And we tweaked the rules a little bit so it doesn't have to be just Q & A, there can be discussion posts. And so that's kind of a new format for the Stack Overflow corpus. It's always just been, “Hey, you need to ask a question in a specific way.” And so there's been some great discussions there. They're open-ended, people are having debates. For example, there was one on the CI/CD Collective. It's sort of rethinking infrastructure as code from scratch. Does it have to be this complex? And so I guess somebody had read a book and it kicked off a thought in their mind of what could be done to reduce infrastructure as code complexity. And so then a lot of people chimed in, and many of these folks have many, many years of experience in the world of software engineering, so it's cool to hear from them and have them share their ideas.

RD Yeah, these kinds of discussions between people who know a lot of stuff are some of my favorite things to read on the internet. I do like going to Hacker News and listening to people argue there. I learn more about how things work and what the particular pros and cons and arguments are. Love your comments, comment on the blog.

BP Comment on the blog with your real Stack Overflow handle and we'll engage you. All right, important news for my box of cords and cables– Apple is switching entirely to USB-C and getting rid of their proprietary lightning port. This was a regulatory change, Ryan? What happened here?

RD So I think there was a European regulatory agency that sued them, a European Union mandate that required mobile devices to adopt USB-C. They required interoperability. So I think USB-C is a good standard and Apple has been putting out ridiculous random standards every couple of years.

BP Yes. As we sit now, I'm on a Mac with a Thunderbolt port from the monitor going to an adapter, and then over here I have an adapter from USB-C to regular USB because Mac doesn't bother putting USB-C, and I just don't want to be dongle guy anymore. I don't need to be Mr. Dongle anymore.

RD Yeah. I remember at my last job I had a Mac laptop and I left the charger cord at home on the trip, so I went to the store and had to get this specialized $80 charger cord and that's ridiculous.

BP Hey, that's where those sweet, sweet profit margins come from, buddy. Don't touch those.

RD Oh, buddy. Design ain't free, right?

BP Yeah, exactly. Form and function.

RD I think we're starting to see a little teeth in some of the tech regulatory stuff and I think that's a good thing. Getting better standards, having everybody use them, having those standards be better instead of having your particular thing. I think that's why VHS won over Betamax. It was an easier standard.

BP And I certainly think from an environmental perspective it's great. I've been dealing with moving my in-laws recently and they have a lot of old stuff, and it's just so many cords that are useless. Whereas if we can all get on USB-C and that lasts for 10 or 20 years, I can recycle these cords, I can use them with new devices. It doesn't have to be that any device over five years has its own bizarre proprietary charger that's only good for that device. We don't want that. We don't want to live in that world.

RD Respect for Apple for their security practices, but it's why I don't use iPhones.

BP Ooh, I'm talking to a Windows man here.

RD I know. That's right. I build my own computers like Henry Cavill.

BP Do you paint your own minifigs too?

RD Oh, no.

BP Love that Superman is a super nerd.

RD That's right.

BP All right, last little bit of news here. Pretty interesting– Getty Images is getting into the generative AI game. They sued a generative AI company when it became obvious that a lot of their copyrighted images had been used to train it. You can see the Getty Image watermark appear. Sort of this weird ghostly version of it would appear and it became clear. But they've also recognized, “Look, this technology is coming, we can't ignore it.” And so there's going to be a version with their subscription where you can ask for something. It's been trained on all their licensed stock photography, and therefore, when you get the image out of it, you know that it was generated from licensed content that Getty owns. And having worked in the news business for a long time, this makes a lot of sense. We used to have Getty at The Verge for our stock imagery. iStock and Shutterstock and all that stuff you and I have used and sometimes even in an abstract way. It used to be, “Hey, I need an image that is evocative of robot or cybersecurity or innovation,” and they have some illustrations in there, so it's cool that now you could come at it with a more detailed, evocative, precise prompt, and it could generate options on the fly. And in the future I'm sure with the ones that are evolving, you'll have inpainting and aspect ratios and all different kinds of stuff that you can use. So what are your thoughts on this, Ryan?

RD So I think that this is interesting in terms of the legal aspect of it. They ruled that AI-generated images couldn't be copyrighted, but if you own the entire dataset that it's trained on, are the resulting ones also copyrightable?

BP Yeah, it's something that's still sort of working its way through the courts and is a murky area. And there are a lot of companies, ours included, where if we're going to put something up on our site, we don't want any legal ambiguity about it, so we prefer to pay for a license and then use that tool as opposed to grabbing something where the onus is on us in the future to assume the copyright liability.

RD I mean, there's also the human story. People are creating all these images that it's trained on. Same as Stack Overflow, people are creating all the questions and answers. And I think this is a weird step for people. Are people going to stop selling their images to Getty Images now? Are they going to say, “Oh, you're just putting this in the sausage maker.”

BP Yeah, that is a great question. To what degree does it devalue the photos because now a license comes with the ability to maybe generate infinite variations or whatever my imagination. I guess we'll see how they set their license up. Maybe it's that you get 10 AI-generated prompts a month for this amount or whatever. But I certainly think it begs the question of what happens to photographers and illustrators and copywriters and people whose job a Gen AI can now do pretty well. I guess one thing we saw recently that definitely reflected on this was the writer's strike in Hollywood and they walked off the job for almost 150 days and they got concessions from the studios about not allowing writers in the writers room to be replaced by AI. So actors are next, still working on securing their digital rights in perpetuity, but these things are playing out.

RD I mean, Bruce Willis already sold his digital rights.

BP Well, if you want to sell your digital rights, be my guest. What people are saying is basically that when I appear in your film, I'm not granting you the right to then remake me in another movie with my voice and my face, unless I say so.

RD Right. They were trying that with extras and just being like, “Here's 150 bucks, we get to use you as an extra in any movie we want digitally.”

BP I mean, that's the slippery slope. You could always find someone to cross the picket line probably and so how do you do that? And these unions exist to sort of say, “Within this industry we have certain rules that we've agreed on to protect workers. You can't go out and find somebody else to break these rules.” So we'll see.

RD There's already folks creating fully artificial pop stars and avatars and it'll just be fully artificial actors and actresses and it'll just be one guy with all the money making everything.

BP No, it won't. I mean, even before Gen AI, the most popular pop star, I forget if it was Japan or Korea, but there was an extremely popular holographic pop star for years, and this was before Gen AI. It was just created by people, but they didn't need a physical person. And people have had AI chatbots now who have become their surrogate boyfriend or girlfriend and they text with them and they form a deep emotional bond with them. I mean, look, let's go back in time. If you're more obsessed with your stamp collection than you are with human beings, if a telenovela is more meaningful to you emotionally than your spouse, you don't need AI to become attached to something inanimate. That's part of being human.

RD Yeah, but now that your telenovela can talk to you or your stamp collection can go on deep philosophical conversations, that's going to change things.

BP Right.

[music plays]

BP All right, everybody. It is that time of the show. Normally I shout out a lifeboat but we don't have any fresh ones, so like I mentioned, there are now discussions on our Collective, so if you are interested to work with PHP, you can head on over to the PHP Collective and check out a discussion. “What are the most useful new PHP features for version 8,” and tons of folks are weighing in on what they think are the most useful features. I'll be sure to include the link in the show notes so you can check it out for yourself.

RD PHP lives.

BP PHP lives. Never die, kill it with fire. I am Ben Popper, Director of Content here at Stack Overflow. You can find me on X @BenPopper. Email us with questions or suggestions, podcast@stackoverflow.com. And do me a favor, leave us a rating and a review because it really helps the show.

RD I'm Ryan donovan. I edit the blog here at Stack Overflow. It's at stackoverflow.blog. And you can find me on X @RThorDonovan.

BP Thanks for listening, and we will talk to you soon.

[outro music plays]