The Stack Overflow Podcast

When AI meets IP: Can artists sue AI imitators?

Episode Summary

Ben and Ceora talk through some thorny issues around AI-generated music and art, explain why creators are suing AI companies for copyright infringement, and compare notes on the most amusing/alarming AI-generated content making the rounds (Pope coat, anyone?).

Episode Notes

Episode notes:

Getty Images is suing the company behind AI art generator Stable Diffusion for copyright infringement, accusing the company of copying 12 million images without permission or compensation to train its AI model.

Meanwhile, a group of artists is suing the companies behind Midjourney, DreamUp, and Stable Diffusion for “scraping and collaging” their work to train AI models.

One of those artists, Sarah Anderson, wrote an op-ed in The New York Times about seeing her comics gobbled up by AI models and regurgitated as far-right memes.

Speaking of copyright violations, did Vanilla Ice really steal that hook from David Bowie and Freddie Mercury? (Yes.)

Check out the AI model trained on Kanye’s voice that sounds almost indistinguishable from Ye himself.

Read The Verge’s deep dive into the intersection of AI-generated music and IP/copyright laws.

Watch the AI-generated video of Will Smith eating spaghetti that’s been called “the natural end point for AI development.”

ICYMI: The Pope coat was real in our hearts.

Columbia University’s Data Science Institute recently wrote about how blockchain can give creators more control over their IP, now that AI-generated art is clearly here to stay.

Congrats to today’s Lifeboat badge winner, herohuyongtao, for answering How can I add a prebuilt static library in a project using CMake?.

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ben Popper, Director of Content here at Stack Overflow, joined as I often am by my wonderful collaborator, Ceora Ford. Hey, Ceora.

Ceora Ford Hey, how are you?

BP I'm good. So I'm excited for today's episode, the world of technology, we're in a new hype cycle. All anybody wants to talk about is AI this, large language model that. But you are bringing it back. You're connecting today's hype to yesterday's hype. You shared a great piece with me. Set me up. What are we going to chat about today?

CF Okay, so this is really exciting for me and a little bit, I think, surprising because I feel like I've been the person who's kind of rained on everybody's parade with blockchain and AI.

BP You sure did, but now that it's not hype you're ready to embrace it.

CF Now, I feel like this is not the first time, but one of the few applications of blockchain that I actually agree with. So this article is entitled, AI Art is Here to Stay: How Blockchain Can Help Creators Gain Control Over Their Work. It's basically a retrospective kind of article written by someone from the Columbia University Data Science Institute. And I found this article to be fascinating because a few days ago –or maybe more like a few weeks ago– I was on Twitter and I was asking if there is any way that artists can protect their work from being used for training data sets for AI art, because I'm sure there are some artists out there who would prefer that not to happen, so how can we prevent that from happening if it's something they don't want? Essentially, that's a very, very hard problem to solve. And the article kind of discussed that because there's so many different factors, one of them being that it can be hard to prove that an AI image was trained on your art and replicates your art style. That's a hard thing to fix and to prove and to detect. There's no tool out there yet that you could use where you input an image and say, “Tell me if this is the real deal or not.”

BP Because it's not an exact copy. It's like a fuzzy sort of statistical resemblance of it. Even if I say, “Make this in the style of Picasso,” it looks like a Picasso, but it's not a copy of a specific one. I will say there's one very funny thing, which is that one of the AI services just ingests a lot of Getty photographs which had a watermark on it, and so if you would ask for a specific thing like a football star about to score a goal, you would see it and there would be the ghost of a Getty image watermark because it had learned that over time. So they have a pretty good case, I think. They'll be able to make the case.

CF Honestly, the article also did mention that watermarks could be a solution to this problem, especially because it doesn't only talk about AI art. It also talks about AI generated images in general because some of them are getting really good at replicating real human beings doing things they never did. So one of the things they mentioned was watermarking your art or your photography so that people can know when it is or isn't real. The interesting thing about this article though is a tool they pointed out called The Glaze Project, and this is probably the most exciting thing that I learned about from this article. So The Glaze Project is basically a tool that artists can use where they can input their image, and this tool will make very small changes, down to the pixel that the human eye can't recognize, but it throws off AI generated art tools. It throws them off so when they try to mimic your art style, it will throw out something that's very, very obviously not yours. It'll produce something that's very obviously not yours. I'm not sure how it works under the hood, because I've read through the website and they don't really talk about the technical side of things. But the whole idea too is that even if you're a person who's done a ton of art and there's so much of your art already publicly available and AI has already been training off of your stuff, if you, from here on out, continue to use Glaze, eventually it'll get to a point that the data set will have so much of your art with the Glaze, they call it an overlay kind of, I guess.

BP Yeah. I've seen adversarial things like this for AI before. So one of the earlier things that AI learned to do really well was identify an image. “This is a picture of a cat.” “This is a human face in a video feed. I know what this is and I'll track it,” or whatever, and people would make these really funny sweaters or little QR codes and if you just held that up, it threw the whole thing off. It said the person was a panda or it couldn't recognize the person. So there is some way within the language of pixels, exactly like you're saying, to kind of trick the machine. And it's really interesting thinking about doing that so that your stuff can't get scanned even if it gets ingested. But I think there's almost a bigger question here which is, how quickly are people going to start demanding new rules? So a bunch of artists are suing a bunch of AI image generators. Like I mentioned, Getty Images is suing, coders are suing about open source code being used. But in the meantime, while we wait on these cases, the AI systems are plowing ahead. In the world of search, for a very long time you've been able to say, “Do not crawl my site.” So big search engines don't crawl Facebook and put that copy in because Facebook wants to keep the data for themselves, it makes perfect sense. And lots of others– Craigslist, don't crawl my stuff. If people want to know what's on Craigslist, they’ve got to go to Craigslist. Makes total sense. So will we in the future see something similar that's like, “Do not train on my data,” just like ‘do not crawl’ for the search engine, and like you said, how would we know? Well, you'd have to make a legal case and say, “I demand discovery. I want to see everything that was in your data set and if my stuff was in there without your permission, we've got a problem.” But I guess the open argument that's always been is, is it fair use to learn on the data and then produce novel art?

CF Yeah. And I think too, the article mentions that there's also an issue of proving ownership, which is where blockchain comes in. So the whole cool thing about NFTs was that you could prove on the blockchain that this NFT was yours no matter if someone took a screenshot, whatever. So the article posits that people begin to use blockchain to do the same thing for their art so that they have a very clear proof that this is mine and this AI used my art, so that in the future people can prove ownership, which I guess would help. And in situations where they don't want their stuff to be used, I think that what you're describing is the natural progression or the natural solution that we need to get to. And it's interesting because I think to get there, we would have to have a partnership of artists and developers working together to build something that will be technically sound and works, but also easy to use. Because I even think for blockchain, that's a relatively technical thing. I'm sure most artists would be like, “Okay, how am I supposed to get my art on the blockchain?” So the whole idea is like, “Okay, we can use technology somehow to protect artists, but we have to make sure that it's accessible and easy to understand and affordable,” all that kind of stuff, which I think is where we come to another thing that we have to figure out. But honestly, for the first time ever I'm excited. I'm excited about the possibilities of building something that could actually work and be sustainable and help solve this problem, because I'm a person who's for the artists and I want everything to be ethical. And I'm slightly excited about AI, but this is the kind of stuff that makes me not so excited. So I feel like if we could come up with solutions to these kinds of issues, it'll make it much easier to build ethically with AI.

BP I’ve got to share this thing with you which I think falls into the same category. So I was talking with my friends about what AIs can do now and what they can't do, and one of the things we discussed was that we haven't really seen great music. You can tell Midjourney to make a great image, it's kind of stunning. You can say, “Write an essay.” It writes a college level essay. But I haven't heard AI composing songs where I was like, “Oh, this is a hit song.” But then two days after I said that I saw this tweet which I just put in the chat for you, and basically any artist out there whose voice is very unique and who has lots of music, you can train an AI model now and it will learn all the nuances of their voice. Then you go in and write eight new bars of raps, and now you have a new Kanye song that, if I was just listening on the radio I'd be like, “This is Kanye. It sounds like him,” and that poses such a big challenge because even more I think than, “Give me the style of the artist,” this has a human quality where you're like, “Oh, I recognize this individual person. I can pick out the details of their voice.” And I don't know if the sound of your voice is something you can copyright. There have always been Michael Jackson impersonators and cover bands that go out and perform the hits of The Rolling Stones or The Beatles and sound and look just like them and that's fair game. So I really don't know what happens with this stuff, but it's wild to me how indistinguishable from a great human imitator or the real person this sounds.

CF Yeah. I think in a previous episode a little while ago we discussed– today the theme is AI art, but we did discuss AI generated audio. And I told you about the person who had made an audio of Beyonce talking about BTS when she’s never done that before. And to me it didn't sound exactly like her so I could tell that it was AI, but I know that it's possible for it to get to a point where you won't be able to tell, just like how AI art now can do hands well when that was the telltale sign before.

BP Yeah, you were always like, “But the hands!” and now you don't even have that anymore.

CF Yeah, and we don't have that anymore because that's how AI works. Over time it gets better. So this is something that I think a lot about, especially for an artist or a musician or even someone who's famous and they give speeches a lot, like a politician or something like that, and they've passed away, we could possibly have a new Michael Jackson album produced by AI, which to me is nuts to think about. And again, I have no idea how we're supposed to have anything that's going to prevent this from happening or allow it to happen in an ethical way. I don't know.

BP The music industry more than any other industry kind of has a good set of rules. The writer gets a credit, and the producer gets a credit, the artist gets a credit, and if you're going to do a cover of this song and publish an album, you're going to pay these royalties. They kind of have a whole system set up. So I wonder if we could start to set that up– Look, you can make a new track with Kanye's voice, but he's going to get 20% of the profits, or whatever. I'm not saying that that's legal, just that the music system kind of has created rules for this. And famously with Vanilla Ice and David Bowie or whatever, how many notes can you copy before you owe somebody royalties, or how many notes do you need to change? And I guess that's the big question that these AIs pose to us. There were things that, until the rise of these recent models, only humans could do in a way that didn't fool other people. And now the Turing tests, the areas in which we can be fooled, from essay writing to songwriting to painting, have changed rapidly and dramatically. Do we feel safe getting fooled like this? The people whose data was used to create these new systems, are they owed money? And as we go on to this wave of new creation, how do we make sure that it's safe for society and that there's room for us– for us the people, the humans– to continue to participate?

CF Yeah. I even think about if somehow in the future we have laws and regulations or whatever the case may be to limit what we can and can't do with AI, even if that's the case, if someone produces an AI generated audio of some politician saying something outlandish or the like, even if it's illegal, if it gets on the internet and spreads soon enough, it's basically going to become law. It's going to become the truth, because it's going to be accessed by so many people by the time it gets taken down and we try to retract that and say, “That wasn't real,” it could still be widely believed.

BP Don't you think we already passed that point? I feel like we were already living in this post-truth world of ‘one side believes this and the other side believes that,’ and until you took it sometimes to court, two camps would just live in two totally different realities, because of the internet, because you could live in your filter bubble or see something that's edited.

CF Oh yeah, absolutely. That goes along with the fake news thing of false information just spreading rapidly. I think pictures and audio are way more impressionable to the human mind than text is. So if you see a tweet, tweets that have false information are already extremely damaging, but imagine pictures and videos or audios and stuff like that. And it's not necessarily a new issue because that's been a thing on Facebook and Twitter and all the other social media platforms, but I just feel like AI is exposing a lot of the issues we already had with how the internet works. And I think this kind of gives me hope.

BP So I shared the Kanye one with you and you can listen to that and tell me for yourself how much you think it sounds like him. Then I'm going to share with you another one. So video is the frontier where we haven't totally cracked it. AI can make video, but it's kind of weird and distorted and it's the way the images were one or two years ago, which to me means, “Okay, pretty soon.” I can see the path we're on. So I'm going to send you this to check out. I know that this doesn't look real, but there's little bits of it that look kind of real, and you could sort of see how with just a little bit more work. We’ll put this in the show notes.

CF They're almost there. I don't know if you've seen, there's a Disney show. I think it's called The Prom Pact or something like that. It's a new show and their extras are AI.

BP Oh my God, that's too much.

CF And I don't know where the tweet is now, but they look terrifying. The people in the stands of the basketball game or whatever look so scary if you look close up, but it's accurate enough to scare you, and accurate enough for you to think that it could possibly get to human level in the future.

BP Yeah, I'll put this in the show notes. It's Will Smith eating spaghetti, as imagined by one of the AIs that makes videos. And it looks bizarre. It doesn't look great. But then within there you get this little hint of, “Oh, for a second that really does look like Will Smith eating spaghetti.” So give it another 6 to 12 months.

CF Yeah. It's like how deepfakes a while ago used to look like deepfakes, but now a lot of them look really, really accurate and real.

BP Oh, yeah. I mean that one of the Pope fooled everybody. I mean, that looked like the Pope in a drippy coat. Everybody loved it. Because it looks like a photograph. It's photorealistic now.

CF Right, yeah. I don't know if you've heard on TikTok, one thing that's really popular now is this audio of Joe Biden singing an Ice Spice song. And obviously it's fake, but it does kind of sound real enough for it to be scary.

BP Well, this is what I'm saying. What was Saturday Night Live? It was all these great impersonators and we loved it. And what is Weird Al? Doing great imitations and that kind of joke and that kind of content is allowed, that's satire. It sucks that now the AI can do it and we're kind of cut out of it, but also it can bring some joy to people's lives. I don't know.

CF Yeah, I think it could be good, because I think the Joe Biden thing is funny, but I also know that it's fake. And if someone on SNL is acting like the president, I know that it's fake. But AI, if it gets too good, will we know that it's fake? And how can we prove that it's fake and all that kind of stuff? I don't know.

BP Maybe I've said this to you before, but the only sort of counter-argument is how soon until people just start being like, “Unless I see it in person. It's got to be IRL or nothing,” because we're all going to get punked enough times where we're going to be like, “I can't believe anything I see on the internet now.”

CF I think a lot of people are already at that point where they don't trust anything they read or see on the internet unless it's extremely, extremely verified. I think blockchain could help. I can't believe I'm saying this, first of all, but I think blockchain could help with this if we could get it to a point where, I know for NFTs you could check very easily who owned the image or who owned the NFT. If we could do something like that for images and audio and things like that where you can very easily check who owns it and see whether or not it's real.

BP It used to be that a digital artist would make something and they’d just publish it on the web and they'd say, “Hey, I have the copyright.” Now it'll be a bit more strategic like, “When I publish something, it's got an invisible watermark and the AI can't scan it, and I've already put it on the blockchain and so I've got my defenses up. I'm ready for somebody to try to take this and to stake my claim to it.” Last thing before we go– I just want to mention this episode will come out after we've made some announcements, but Stack Overflow has to think about all this stuff, and so for us there are some really interesting possibilities. These possibilities are something that we’ve discussed in a recent blog post talking about how the key for us, just as it is with these artists, is that the creators are able to receive some recognition or some compensation, and that all the data on the internet is not just being gobbled up into these black box models which spit out a sort of statistical average of an answer, and then nobody’s visiting the websites or rewarding the creators. We think that’s really important here at Stack Overflow, and our CEO has given interviews in the press to Wired and the Wall Street Journal explicitly saying that we need to figure out a way perhaps to license our data, and that any money that comes back in from licensing that data is going to go to support the community of Stack Overflow users who have been contributing their knowledge over the last 15 years. So there are some really useful applications of this, especially within the world of software, but it's all moving too fast. I need a t-shirt that says, ‘It's all moving too fast,’ because that's the way I feel.

CF Yeah. I think for this, when all the crypto and blockchain stuff was the hot topic, I was struggling to see enough positive uses of this to make all of this worth it, but I don't feel that way with AI. I feel like there's enough positive things that could happen with it that we really should try to fix the negatives so that we have mainly positives. That's what I'm hoping.

BP Very well said. With blockchain it was like, “Yeah, maybe it'll change the world, but I don't really see it happening.” With AI, it's already too smart and it might get smart really fast. We’ve got to pump the brakes so we can do the good things with it. Did you see the thread about the person whose dog was sick and they couldn't figure out the diagnosis? Did you see that thread?

CF No, I didn't.

BP The dog was sick. The first vet gave a diagnosis, but it didn't really help. They were given the medicine. The dog wasn't getting better. They took the lab results, just the raw lab results and put it into the AI and they said, “What do you think?” And the lab said, “Well, it could be this.” That was the first diagnosis. “It could be that,” and the second one didn't make sense. Or it could be this third one. They called up a vet and they said, “Do you think it could be this third one?” The vet said maybe and they prescribed medicine and the dog got better. That kind of thing where it's like, that's what it is. It's a reasoning agent that can look at text and look at data and come up with some ideas about it. In that context, super useful.

CF Yeah, agreed, agreed. That's really cool.

BP Well, everybody, if you want to check all this stuff out, we'll be sure to put some links in the show notes to The Glaze Project, to the bigger article from Columbia University Data Science Institute about what blockchain can do. And then a couple of great links you're going to have to check out: a new Kanye song not from Kanye, and Will Smith eating spaghetti. Don't miss it.

[music plays]

BP All right, everybody. It is that time of the show. Let's shout out someone from the Stack Overflow community who came on and helped to share some knowledge and saved a question from the dustbin of history. Herohuyongtao, thank you so much. Awarded March 29th, “How can I add a pre-built static library in a project using CMake?” If you've ever wondered, Hero has an answer for you. They've helped 35,000 people and earned themselves a Lifeboat Badge, so congratulations, Hero. I am Ben Popper. I'm the Director of Content here at Stack Overflow. You can always find me on Twitter @BenPopper. Email us questions or suggestions, podcast@stackoverflow.com. And if you like the show, leave us a rating and a review. It really helps.

CF And my name is Ceora Ford. I'm a Developer Advocate at Auth0 by Okta, and you can find me on Twitter. My username there is @Ceeoreo_ and you can also find my website, Ceora.dev.

BP Yes, by the time you hear this I may have lost my blue check mark, but it's really me, I swear. I'm not famous enough to be impersonated and I'm not paying for a check mark.

[outro music plays]