The Stack Overflow Podcast

On the web, data doesn’t define us. It creates us.

Episode Summary

In this episode, Ben interviews Jannis Kallinikos, a professor at Luiss University in Rome, Italy about his new book Data Rules: Reinventing the Market Economy, coauthored with Cristina Alaimo. They discuss the social impact of data, explore the idea that data filters how we see the world and interact with each other, and highlight the need for social accountability in data tracking and surveillance.

Episode Notes

Jannis Kallinikos is a coauthor of Data Rules: Reinventing the Market Economy (MIT Press, 2024) with Cristina Alaimo, which lays out a framework for a new social science focused on the socioeconomic changes driven by data.

You can read an excerpt from Data Rules on our blog here.

Explore more of Dr. Kallinikos’s work.

Shoutout to Lifeboat badge winner Ebrahim Ghasemi for answering What is the structure of an application protocol data unit (APDU) command and response?

Episode Transcription

[intro music plays]

Ben Popper Hello, everybody. Welcome back to the Stack Overflow Podcast, a place to talk about all things software and technology. I am Ben Popper, Director of Content here at Stack Overflow, and today I'm going to be chatting with Jannis Kallinikos, who is making a case in his book, Data Rules: Reinventing the Market Economy, from MIT Press, that the revolution brought about by digital data has kind of created a whole new reality in which the old rules don't apply. So we're going to chat about social impact of data, how new technologies are making data more portable, less context dependent, and how datafication expands the spectrum of economic possibility. So without further ado, Jannis, welcome to the Stack Overflow Podcast.

Jannis Kallinikos Thank you, Ben. It's a pleasure to be here.

BP For folks who are listening, tell them a little bit about yourself. How did you get into the world of software and technology and what led you to write this book?

JK I am now a professor at Luiss University in Rome, Italy. But prior to coming here, I have been a professor at the London School of Economics at the Information Systems Department, which then became part of a group of management. So I've been for more than 30 years dealing with issues of information, data, technology, and I would say the way they impact human and social practices. That's my take on the matters. I'm not a technician, but someone who investigates how these complex and sophisticated technologies impact our lives, impact organizations, and businesses and governments.

BP Right. So from where I sit, over the last year and a half since the advent of ChatGPT, the big change in the world of data has been that we understand now that these large language models are training on the internet. Any public data that they can find, 10 terabytes or more worth of data, they're ingesting all of that and using it to create this language engine, this reasoning engine, this natural language conversation agent. And for Stack Overflow, we've recently gotten into the world of API licensing where companies are coming to us saying, “We want to utilize the knowledge created by your community to help train our AI models,” and this is a new world for us. Some of our users are into this, some of our users are not into this. They want to know what kind of privacy PII about themselves is going to be transported with the data and they want to make sure that, as these AI models eat up more and more data, that the humans who created the knowledge get recognized, attribution, citation, and that some of the resources or the value that comes from the use of these AI models flows back to the knowledge communities and the humans online who created it. So that's where we're sitting. Let me ask you, as you set out to write this book, what were the most pressing issues that you saw that you wanted to frame and put people's attention on?

JK We completed the book ahead of the emergence and diffusion of large language models. But I believe, and I want to claim, that most of the things we say in the book apply very much to the case of large language models and the ways artificial intelligence is developing. Now, the fundamental motivation for writing this book, there are many, but a few straightforward motivations is the profusion, if you want, of data in our society, in our organizations, and the way these impact, as I mentioned before, the way organizations work, the way platforms emerge and work, and platforms are kind of different types based on traditional organizations. So this was a straightforward motivation, but there was another one, I think, Ben, that links to what you told me before. We were quite unsatisfied with the widespread view of data as just technical items used to achieve computations, train algorithms, or be used in data analytics and big data analytics just ahead of the machine learning revolution. And we were dissatisfied knowing and observing empirically that data are not just that. They are obviously data points, standardized inputs to data calculation, but they are more than that. They are instruments through which we filter how we see the world and how we interact and communicate with one another. And with that, these aspects of data were being left out of the discourses we have had about data and it was an important, therefore, motivation for writing our book.

BP Right. One of the things I remember from my time as a journalist that I think maybe speaks to what you're saying, and there was a famous story in The New York Times or maybe it was The Wall Street Journal that XYZ big tech company knows you're pregnant before you do. They have all these different data points about what you're searching, how you're behaving, how you're sleeping, where you're traveling, and suddenly you see this ad pop up for something related to pregnancy, and it's put together this matrix of data points that are very personal, that are very intimate, that are very sensitive without you really realizing it. So in that sense I totally agree. This is not just, “Hey, we need to crunch some numbers in the mainframe to make sure that the financial algorithm goes better.” As we moved from the era of personal computing to social media, we provided a ton of very personal data. As we move to mobile, an incredible amount of data about where we're moving. And now, with people wearing devices, even about how our bodies are working day to day. So in that sense, I completely agree with you and I do think Europe has really been on the forefront of pushing for better data privacy regulations and the United States has followed them because Europe is a big enough market that it made sense. And I've been through many GDPR headaches on my side of the fence as a marketer, I can tell you. We pay attention to it and the legal team cracks the whip. So we do try to stay true to those ideals, even if sometimes they can feel a bit onerous.

JK Absolutely, Ben. These are absolutely vital concerns. When we can debate what issues of regulation we need, how we should relate to data, how our personal lives will cross with the life of institutions, and all these are absolutely important. But we also say something important in our book. Those discourses on privacy and surveillance do not reflect in an adequate way about the multiplicity of data as markers of reality, as instruments of knowledge building and media of communication. They just think that, and this is an important issue of course, I admit this, how data can be used to track us. It is more complex than it appears at the surface.

BP Right. It isn't just about tracking us, but about defining us. And so another way of thinking about that, going back to my journalism days, might be that there are attempts at predictive policing. If I know these three or four or five data points about you, well now I think that you're somebody I should keep an eye on. I've decided on your persona. Or you're applying for a job or you're applying for a home loan. Well, I've gathered these three or four or five points of data and I've made a judgment about you as a consumer and somebody who is worthy or not of getting the loan they need to start a business or own a home. Is that what you mean about where identity and data fuse?

JK That’s what I mean, Ben, but I also mean that these four or five things can become six or seven, and that matters. How we mark the people on social media and on the internet, what we use as markers of people's behavior is not nature– it’s being made. And this is an important matter for discussion. It's not that I use a click on the internet or on social media, and that is me, and I'm doing this and this and this. This does happen, it's being used abundantly, but reveals only aspects of what data can do, and hides others that data could have done and are not doing. With the example you mentioned before, if I add more data into how criminal behavior can be predicted, another picture may emerge as compared with the previous data that existed in this and used a more poorer indication of how criminality happens and how criminal behavior occurs in which context and all this.

BP So maybe you could give me some of your favorite anecdotes from the book that point to things that are troubling to you or exciting to you and whether you want to look at this as glass half-full or glass half-empty. What are some of the anecdotes in the book that you found most compelling and then maybe we could talk a little bit about, given that, how do we adapt and what are good solutions or what's the right path forward. But let's start with your favorite anecdotes.

JK We used several examples to just debate the critical issue of how data are made rather than found. This is important. And making our claims, we of course need to give examples so people understand and are also able to judge the validity of our claims. One example is social media music listening or streaming platforms in general in viewing. For instance, how is listening being defined as listening on social media? It's not straightforward. And is the listening of one minute enough to qualify the listening of a track as a listener by User A, and does that matter? In the case of music, there is two minutes as compared with symphonies of 35 minutes or one hour. How are these decisions that have been made to track a user as listening to this track? Arbitrary decisions, decisions that are not arbitrary in the sense that they have a rationale, but they are arbitrary in the sense that they have been decided by people, and have been decided on the basis of some technical issues, what is easier, what is reasonable, and all that. But there are decisions behind how a tracking, therefore, is being recorded as a tracking, a listening, viewing and so on. This is one example. We've studied a lot also– we have several other examples about how some banking institutions use data to shape their actions and how these data could have been different if, for instance, other choices had been made. These are some very quick and dirty notes about the fundamental idea, Ben, that I want to insist on and goes through our book, that data are not found, but are made.

BP I think I understand what you're saying. I'll give an example from my own life and maybe you tell me if it fits into the thesis that you're laying out. When I was a journalist, I wrote a story about how police were investigating these teen gangs in Harlem, and what they had decided was that if you were part of the same Facebook group, or if you commented or liked a certain picture that included a known gang member, or let's say you listen to music that had been published that related to this gang activity, then you belonged on an official list of a known gang-affiliated person. And that's a legal status– that's something that would determine, well, if you went to court, will you get bail or not? So that data is made, that judgment is made based on your social media activity that now has ramifications for you as a young person and your interactions with the police and the court system. Is that kind of what you're getting at?

JK It's an excellent example, Ben, absolutely. That's what I'm trying to say. And we spend a lot of time raising the issue and raising the awareness between you journalists and other people, but also we academics, that the way data has been made needs to be made socially accountable and debatable.

BP Gotcha.

JK It's not something natural, and this is hidden from the overwhelming majority of the discourses around data today.

BP In that context, what are, in your opinion, the most important examples of this? The one I gave is obviously important in the localized context of New York. What are the ones that you think about at a nation-state level or a global level or the global economy level that you're particularly concerned about?

JK The overwhelming use of data by companies today is to use a number of tricks to mark and transform the behavior of users, either on transaction platforms or on social media platforms. They use several tricks and several models for transforming the behavior of people into the platforms, into the facilities they provide, into something which they can act upon and work. Example, for instance, another typical example– imagine you ask me about this. How did it come that the Facebook Like has become one of the most important indicators of the preferences of people, now used not only by Facebook but by everybody across the web? The Like is an artificial indicator of some sort of agreement, affinity, approval. It's very vague the status of what it conveys, yet it's being used by companies in a pattern to provide recommendations, to train algorithms for recommendations, and to produce a number of personalized feedbacks which people act upon and which they shape their interaction accordingly. So these are some of the issues we would like to see. What if these data worked differently? Not only the question which is in the ears and in the minds of everybody– don't let these companies surveil us. That is an important issue, don't misunderstand, but there is an additional issue. And the additional issue is that this is a cut on reality. Using a Like to make user preferences and recommend and structure the behavior of users is a very limited way of interacting with and shaping the behavior of people.

BP And I think you're right. I was in college in 2002/2003 when Facebook launched before the Newsfeed even existed, and so I kind of have both a life pre and post internet, pre and post social media. But I have kids now who are going into middle school and they don't have that. And the pressure to be part of social media is enormous, and at the same time, the research about its impact on kids is alarming and so I'm struggling with that as a parent. And a Like can mean so many different things: “I acknowledge that you posted this,” “I actually like this,” “I just want you to notice me so I'm engagement farming with you.” A Like could mean a million different things, and then as you said, that signal gets vacuumed up into a world of e-commerce and a world of other things that try to use that data point for their own purposes.

JK Advertisement in particular, absolutely. So the issue of privacy and surveillance, and what I'm trying to say is that we have also an additional issue. The way we've been tracked and surveilled is also made and it could be different, and it's for us to decide how different it should be.

[music plays]

BP All right, everybody. It is that time of the show. We want to shout out a community member from Stack Overflow– somebody who came on and helped to answer a question and save a little knowledge from the dustbin of history. Awarded June 3rd to Ebrahim Ghasemi: “What is the structure of an application protocol data unit (APDU) command and response?” That was the question to which Ebrahim had a great answer and earned a Lifeboat Badge, and 18,000 other people have viewed and been helped by this question, so appreciate it, Ebrahim, and congrats on your badge. As always, I am Ben Popper. I'm the Director of Content here at Stack Overflow. Find me on X @BenPopper. If you have questions or suggestions, you can email us: podcast@stackoverflow.com. If you want to be a guest or you want to suggest a topic, hit us up there. We’ve been having lots of listeners come on as guests or help us set up what we're going to talk about next. And of course, if you enjoyed today's conversation, the nicest thing you could do is leave us a rating and a review.

JK I'm a professor at the University of Luiss now. I've been a professor earlier at the London School of Economics, part of the Information Systems Group and the Department of Management. I've published many articles as we academics do now and written several books. The book can be found, I think, on the MIT Press, on Amazon. We have also introduced a website for the book. So the fundamental idea of the book, Ben, is the role of data in our society, in our economy, and the way this data are being produced, the way they're being used, and the structural, or, if you like, social effects that they have.

BP Anybody who's listening, if you're in software development, you know about data. Most Stack Overflow users prefer to use ad blockers, as we know, and be anonymous in their screen names. So if you're interested in a book about how data is being used, not just what it is, but how it shapes the world's view of who you are and what you want, definitely a book worth checking out. Thanks for listening, everybody, and we will talk to you soon.

[outro music plays]