Russ Altman: Hi everyone, this is Russ Altman from The Future of Everything, we're on the cusp of another election season, and as people across the country educate themselves on issues and candidates, they have to also worry about separating fact from fiction. We know that people have been spreading misinformation for all of time, but in the last 10- 20 years, it's really gotten out of hand. In early 2022, my guest, Johan Ugander shared his research to better understand the ways information spreads online.
It's very important because it turns out that bad information can spread very easily. We're rerunning this episode today, and I hope you'll enjoy it, think about it, and try to use it when you're making decisions at the poll this year. If you enjoy The Future of Everything podcast, please subscribe or follow on your favorite listening app.
It helps us, and you'll hear about all the episodes as they come out. Social media platforms like Facebook, Twitter, Instagram, TikTok, and others have become an important way to spread information. Now we used to depend on the major networks and their TV and radio affiliates for the news.
Now, however, we can often learn about things from friends or friends of friends or friends of friends of friends and so on, on these social media platforms. I must say, I've noticed that I can often learn more about breaking news events by checking eyewitness reports on Twitter. Eventually, of course, the major news outlets catch up and may eventually have better and more comprehensive information, but the initial situation is often available from tweets.
I was once on a delayed flight to Houston and nobody knew at the counter why the flight was delayed, I was able to use Twitter to figure out that there had been a bomb threat and evacuation of the terminal we were supposed to be flying into. That was immediately clear from Twitter and it took quite a while for the gate agent to understand what was going on. Anyway, this spread of information from social media obviously raises many issues since the folks providing the information may not be trained journalists probably aren't.
May not be objective, may have an axe to grind and maybe trying to influence people for reasons other than the pure pursuit of truth. In fact, scientists have studied the rate of the spread of news on social media, especially rumors, which are often exciting and novel, and they have found that false news spreads faster than true news. Now, we can all have theories about why that may be, but it's a pretty important piece of.
Well, true news, it seems important to have people receive true information. I can't even believe I need to say that as they make decisions about their opinions, their preferences, and their choices. If they're hearing false information, they may make choices that are detrimental to themselves and others.
Although it is very possible indeed probable that the people who are sending around the false information may be benefiting. So we all have an interest in understanding how information flows and what can be done to give truth a chance when it may be slower and sometimes more boring than false information. I spoke about these issues with Johan Ugander, a professor of management science and engineering at Stanford University.
He has studied the spread of information on social media. Uh, and had interesting things to say. Later in our conversation, we learned about other projects he's pursuing in analyzing large volumes of information to understand complex social systems but first, the social media.
At the start of our conversation, I asked him about the motivation for his work in understanding the speed and path of information spread, both true and false and how one approaches such studies. Johan, you've studied differences in the rates of spread of information on Facebook and other social media, depending on whether it's true or not true, and you've had some surprising findings. Can you tell us what you did and what did you learn?
Johan Ugander: Yeah, so there, there's been a lot of interest, um, over the last 10 years, but particularly after the 2016 election in, um, understanding how falsehood spread on social media. Uh, you know, to what extent is mainstream media versus social media responsible for all sorts of information, true or not reaching people. Um, and so there's been, uh, Some really sort of like large scale data driven attempts to, uh, shine light on this question.
Uh, and so the recent work that I've been involved in, uh, sort of looks at this large corpus of, um, news that spreads on Twitter and it tries to sort of take a machine learning perspective and ask how can we classify true things versus false things, and can we learn anything from the sort of machine learning classifiers that are derived from that? Russ Altman: Based on what you just said, it sounds like you're gonna try to determine truth or falsehood. Not so much from the actual content of the message, but from the features it has as it flies throughout social media.
I'm just asking. Johan Ugander: Perfect. So yeah.
Great. So there's been lots of, it's sort of a, it's a multi method question, right? So some people are looking at the content and using language models.
Um, some people are looking at sort of the psychology of what people react to more carefully. Um, and this particular lens that we're using in this work is asking do true things and false things spread differently. Russ Altman: Okay.
Johan Ugander: So it's pretty possible to imagine or it, it's well known that different things do spread differently. So we know that, and we verify this as part of our work that things like petitions spread really differently than news. Cuz it has a sort of, news is timely and has a certain dynamic around the fact that it's spreading in a timely manner.
But whereas petitions get passed more as sort of a word of mouth process. Russ Altman: Mm-hmm. Johan Ugander: There's more of a sort of a game of a telephone being played where they pretend to spread more person to person as opposed to transmitting through like broadcast, um, hubs.
And so there's, it's well known I should have opened with that. It's well known that different things spread differently. And so the question here was, does do true things and false things spread differently?
Russ Altman: There are certain examples of things that are clearly true and clearly false. And so I just want to take a moment to figure out how do we define Johan Ugander: Yeah. Russ Altman: Truth and falsehood just under the experimental conditions?
Johan Ugander: That's a central question of this literature. Um, and it's often some people say, you know, give me a, you give me a definition of true and false, I will solve the problem for you. Sort of like the entire problem is coming up.
. . .
Russ Altman: hmmm. Johan Ugander: . .
. . with a good definition.
Um, and so the work that we're, the sort of the line of work that we're contributing to where we're not the first ones to take this perspective is to focus on content that's been fact checked. And then determined by fact checkers to be either true or false. Russ Altman: Okay.
Johan Ugander: Now that's a very specific sampling frame. So when you look at content that's been fact checked, you're typically not looking at sports box scores. You're typically not looking at sort of like obituaries, nobody's fact-checking and obituary.
Well, there can be sort of like biographical details that could be fact-checking. That's, and I'm sure that somebody has falsely claimed that the Red Sox beat the Yankees. But more broadly, there's sort of some attention being played here just to things that are being fact checked.
So it's, when we talk about true news versus false news, I should actually be very specific and that's an important part of what we were um, uncovering, is looking at true rumor, we'll call it rumors versus false rumors. Something that's sort of rumor like and enough to raise the suspicions of fact checkers. And then was determined to be true versus rumor like and then determined to be false.
And that's very. . .
Russ Altman: Yeah. Johan Ugander: . .
. different than comparing falsehoods versus sports box score, like news stories about sporting events. Russ Altman: Right.
But it is very interesting and I will compliment you on your choice of problem because those are the ones that are at the edge where you could argue people need the most help. Johan Ugander: Sometimes when people, we find out that these studies are focusing on fact checked stories, they're like, oh, but that's, those aren't, that's not what true news looks like. And that's a weird question, but I think it's actually a better question than looking at falsehoods versus all true things, cuz then you just get these baseline differences that sports scores have, for example, a different timeliness.
People tend not to care about sports scores three days ago, whereas news stories still have a certain relevance three days later. And so that means that they just, they do spread differently, but for kind of an uninteresting reason. So there was, there's a deliberate, um, attention being paid to kind of like the boundary of truth.
Sort of like looking at things that are across or not across the line of truth. Russ Altman: Yeah. Johan Ugander: And asking does that sort of like flipping that single bit, the sort of like changing that one thing, um, change how they spread that somehow sort of like spreads more, spreads deeper, spreads broader.
These various questions that people have asked. Russ Altman: Great. So, okay.
Having interrupted you several times, I'm now gonna allow you to tell us about the work. Johan Ugander: Setting up the question is really important. I mean, that's, that happens in all of science, but sort of like understanding the claims comes down to understanding the question.
So the, um, The prior work had found that, um, these true versus false rumors, that false rumors spread further, they reach more people and they also reach, um, they tend to have, uh, a deeper cascade structure. So the cascade structure is sort of like you can envision a tree that's sort of like spreading out with branches. Um, and sort of like, what does that tree look like?
Is it a very tall tree or is it a very broad tree? Um, is two different ways that trees can be different. Russ Altman: Right.
Johan Ugander: Um, and sort of at a given you could ask sort of like, okay, bigger trees are also gonna be either taller or broader in some way, and we'll get back to that. But they found that, um, false news was bigger become the trees, the sort of these trees. .
. Russ Altman: Yes. .
. . .
Johan Ugander: that were growing of information diffusion were bigger, broader in the sense of like the max breadth they were hitting at any given point. How many ways they, how many times they branched, they were deeper and they were spreading faster, which is a velocity question, which is not quite structural, but it's still sort of metadata associated with the diffusion tree. Russ Altman: Can I think of the leaves of the tree as the recipients of the news?
In other words, uh, one branch might be all the people. Johan Ugander: Yeah. The people on the inside also received the information, but they also chose to retransmit it.
But somehow like the leaves and all the little, uh, knots inside the tree, uh, constitute kinda like the population of people who this reached. Yeah. Russ Altman: And when it's a false news, you found a big bushy tree.
Uh, and truth kind is, Johan Ugander: So Yeah. Russ Altman: I'm asking. Johan Ugander: So the big, the prior finding was that these, we, there was attention drawn to this, like an alarm sounded about two years ago in 2018, that false news is spreading more.
And this was a collaboration with Twitter and they released this like very rich data. That's actually really tricky to analyze because you're sitting with, you're not sitting with like, oh, this is a group of, um, sort of old people and young people with certain demographics. You're sitting with like a bunch of bushy trees in one hand and a bunch of bushy trees in the other hand.
And it's sort of hard to compare them. So what we did in this work, uh, this path that came out this past week was that we were sort of doing more careful matching and saying like, well, let's take all the trees that are exactly of size 100 and compare those in a bunch of structural ways. Um, and that's where we found that, you know, okay, so false news does spread broader.
That was, that checked out completely. Um, what we found was when you control for size, false news doesn't look different, the trees don't look different. They're not broader or deeper.
Like they're not more squat or taller. Um, they're sort of spreading in the same way. And then this baseline observation about sort of size just means that there's a higher like elementary velo, uh, velocity would be a speed statement.
But like there's an elementary like transmission rate that's higher for false news, but it's not sort of like going through hubs or playing this telephone game in a way that's different than true news. Russ Altman: Huh. Johan Ugander: Um, so it was sort of like using a matching methods and sort of ways to do treatment control analysis, um, to understand the differences between true and false news.
Russ Altman: Yeah. So that is interesting cuz if I, just to restate what I think you said, uh, Johan Ugander: Yes, please. Russ Altman: The tree is, uh, So, and I'm thinking about the tree as like the spread of the information that seems to be happening faster and you confirmed that, but the, what the tree looks like, um, at the end of a certain period of time is very similar to what it might look like if it was true news.
Johan Ugander: Right. And the sort of like the alternative hypothesis sort of, um, before this work, there was this debate about whether this was a battle on forefronts. Are we trying to sort of make false news smaller, also make it let go less deep, make it go less broad and make it go less fast?
But essentially what we find is it's sort of a battle on one front. The false news just spreads more and has a higher infectiousness. Um, and there there's a lot of, um, there's a huge connection.
There's a long standing connection between epidemiological metaphors and sort of the spread of information spread virus. Russ Altman: I wanted to ask about this Johan Ugander: viruses. Yeah.
Russ Altman: So please go for it. Johan Ugander: The spread of, Russ Altman: Should we think about it as an infection or is that gonna be a misleading metaphor? Johan Ugander: Yes and no.
Um, so I'm a major, um, I, so information spreads differently than viruses. The main way that information spreads differently is that people are making decisions when they receive information and there's cognition involved. Um, the, you know, the epidemiological transmission is about sort of particles landing, uh, you know, and being up, taken up into respiratory system and there's more of like a, like particle system interaction, uh, model is pretty accurate.
Whereas information sort of, it enters our brain and our brain does miraculous things. Um, you know, we decide whether. Oh, this information reached us from two different people.
Do I then believe it more? Well, are, do those people know each other? They're, you're making a whole sort of like, there's a whole social psychology to how information spreads.
Russ Altman: Yes. Yes. Johan Ugander: Whereas, um, when two people cough on you, it doesn't really matter if they knew each other.
There's no, like, your brain doesn't get to decide whether you get sick. So there are definitely limits. But at a basic level, it's still about, um, a spreading process.
So there's a lot of common tools. Um, and rather than saying that sort of like one borrows from the other, um, I'd rather say that they both use sort of a common basic toolkit and then. There's for certain sort of modes of inquiry that are specific to epidemics and the specific information diffusion.
Russ Altman: You know, one of the reasons we study epidemiology and spread is because you want to intervene in the case of viruses, as we all are acutely aware, the idea is to stop the virus from spreading. Now, if you have the goal of stopping a piece of news from spreading, are there lessons from your analysis? And I, in many ways, I think this is a very germane question because we haven't talked about it, but I presume that one of the motivations of your work is, um, basically the logic that false news is bad or damaging or dangerous, and therefore it would be good to be able to stop it.
I could imagine people disagreeing with you on that, and we can talk about that later. But let's say you wanna stop information. Do these infectious models for social media give you some ways forward in stopping or slowing?
Johan Ugander: Yes. Russ Altman: So the progression of bad ideas. Johan Ugander: The first kind of, um, so I think it's a great question and I think that it's also a very School of Engineering question because, um, ultimately I think of social media platforms as built systems, and I tend to, uh, reject kind of like the naturalist perspective that I'm somehow like David Attenborough watching birds.
Like there's a question about how this system was built and could be built, that's sort of first order. Russ Altman: Yes. Johan Ugander: Um, so thinking about that, um, two kind of metaphors that are very, um, front of mind when we've been living in this pandemic are kind of like, oh, well ,you now, there's the closure of sporting events, which is sort of like saying that, uh, you know, math super spreader events, we should limit the how widely things spread.
And that would be very justified if we felt that false news was spreading more through sort of like high, high degree retransmission events or sort of like if it was false news was preferentially finding its way through hubs. But we don't see that to be the case. So, Um, you could close down sporting events, which would mean like, uh, limiting the distribution of sort of media major, mainstream media accounts are sort of the.
. . .
Russ Altman: Yeah. . .
. . .
Johan Ugander: size of any given account. Um, but you'd be shutting down false news and true news kinda equally is sort of the finding. And I like to joke that there's a really easy way to stop the spread of false news.
It's to stop the spread of all news. Russ Altman: Yes. Johan Ugander: And it's a reminder that for various reasons, as a civil society, we do enjoy, um, information spreading, and we just want to try and limit certain types of information spreading.
So the cutting edge on these types of questions is actually coming. There's psychology work that's more in the lab and it hasn't quite, you know, hit full sort of generalizability. But, um, there's these very nice studies that are, um, looking at sort of like when you draw people's attention to the accuracy of content, you tend to, uh, differentially limit the spread of false information.
So, for example, The platform design decision that Twitter has made to, um, you're not allowed to, or if you try to retweet content that, um, you haven't read, So you see a news story and you hit retweet. It'll ask you, uh, you haven't read this, are you sure you wanna pass this along? Russ Altman: Ahhh.
Johan Ugander: So it's a friction that they've added to their product that, you know, everybody who's worked in online platforms knows that these types, like every click is costly. So this is, you know, . .
. Russ Altman: Right. Johan Ugander: .
. . goes back to Amazon patenting one click shopping two decades ago.
Russ Altman: Right. And you're risking it could be annoying your customer. Johan Ugander: You're risking annoying your customer, but they have found it's been validated in lab settings and then Twitter's adopted it.
That's sort of like doing that differentially limits the spread of false information, uh, which is sort of like a, you know. And those types of product changes I think are really important as we started, it's one of our main tools right now, just trying to draw attention to sort of like accuracy and the consequences of your decision and not necessarily driving frictions to zero, which was the momentum of the last decade, I think. Russ Altman: And I recently learned that you were also looking at the criminal justice system and parole issues of parole.
That was a surprise. So can you tell me, um, how does a guy like you wind up thinking about the issue of parole, what are the tools that you're using and what are you finding in that domain? Johan Ugander: That work is really fascinating and I hope that it, uh, can be really impactful.
So this was work that I, uh, got involved with that. Um, Nick McKeown is a professor over in computer Science has been, um, sort of, this has been a passion project for him for a while, uh, and, uh, 2-3 years ago, he recruited one of my PhD students, Jenny Hong, together with one of his own students, Catalin Voss. And um, essentially, uh, it comes down to the fact that understanding, trying to understand the California parole system focusing on California because it's sort of, there's actually a lot of heterogeneity across states.
Um, and uh, it started with a public records request and then eventually it became a lawsuit that EFF came in and they got this sort of enormous, eventually, Uh, an enormous purpose of data was handed over to them, which is, um, all the, or 35,000, um, transcripts from parole hearings of life, of people serving life sentences in the California prison system. And, um, trying, the goal has been to try and understand this human process. This, um, this is a process where, uh, commissioners sit down with prisoners and there's lawyers and all these sorts of things sit down for hour long meetings or often longer.
And sort of review a case and very little structured data exists. It's not something that just sort of is well described by like a bunch of numbers. Mm-hmm.
And so the question was, uh, the goal has been to sort of use something like machine learning to audit these systems, not to replace them at all. That's like a really important basic point. It's using it as a tool for auditing and saying, So looking across, uh, many years that we have these 35,000 transcripts and asking sort of like, well, each one is about 150 pages, so that's sort of, A million more than several million pages, that'd be .
. . .
Russ Altman: Yeah. Johan Ugander: . .
. . very hard to read for any human to read.
Um, now it's not importantly like this is a computational audit of a human system. So like there already were a bunch of people making decisions. But the goal is to kind of evaluate that system at like a zoom level for, for inconsistencies or arbitrariness.
Um, and so the, but the, because it's sort of unstructured data, that's one of the ways that I can became involved because that's sort of my, broadly speaking, so I focus very much on social networks cuz they're their own particular messiness and sort of their own interesting rich object, but also sort of these large, you know, These 150 page text documents, um, then you then use tools for natural language processing typically to try and understand sort of like what are the determinants of, um, somebody being granted parole versus not granted parole. How did these things break down by, um, by race, by sort of time served by, by sex, uh, etc. Even though there was this like, Initial slightly adversarial relationship over sort of like getting the information released.
Russ Altman: Yeah. Johan Ugander: Um, there's ultimately been a very productive conversation with the, uh, California Bureau prisons and parole board, etc. Um, about sort of trying to, and even with the governor's office.
So the, ultimately on these types of parole questions, the gov, the California Governor's Office has a certain amount of, um, control to sort of overturn parole decisions, etc. And so they are very interested in sort of monitoring the parole process and trying to, you know, increase, uh, release more people but do so in a sort of societally safe way. Um, and so, uh, ultimately I think it's really a, like sunlight on a system and helping society, see inside something.
. . Russ Altman: Yeah.
. . .
. Johan Ugander: that's been very murky. I think is a useful metaphor for it.
Sort of like that, um, helping society see a system that's been very opaque. Um, but then also it's about making recommendations and hopefully in a constructive way sort of pointing out that, hey, um, there appear to be sort of like inequities across prisons or there's very, a lot of, um, very different processes used at different prisons etc or, um, the following, uh, factors of a case related to sort of like things that might have happened or not happened in prisons seemed to be, um, surprisingly predictive. We should look further into that.
Um, a lot of these claims that we've been able to make are not causal because there's huge amounts of correlations between all the case factors. Russ Altman: Right, right. Johan Ugander: Um, and they've sort of been extracted, you know, how many infractions did the person have during their time in prison?
That's something that we've had to extract across 35,000 documents of a hundred page of hundreds of pages. Um, and there's certain accuracy there and validating that. So it's been very, it's really that we're trying to introduce that.
Trying to establish a non-zero value to the, to this perspective of using, um, uh, doing a computational audit, um, and then building credibility for that. And then trying to, uh, push for working with criminal justice reform, uh, groups to try and push for changes. Russ Altman: So, okay.
So just to go back to a little bit, into a deeper dive into what you did. You have these, uh, 35,000 hearings that you said 150 pages each. You're gonna use computers to look at what was said, and I guess you have to make decisions on the topics of interest and then look at, and then when you extract the information, now you can put it in a structured form.
So you can look at percentages like how many of them was there, a discussion of X and how many of them was there a discussion of Y and was that discussion in a positive light or a negative light? So can you tell me what the like big headline topics are that at least for the first generation you're looking at? Johan Ugander: Yeah.
So, uh, there's actually even like basic information that hasn't really been released in structured form before. So just, uh, if we just look at the information that we can glean from kind of the cover sheets, um, which structures a little bit, and then you can sort of use basic pattern matching algorithms to extract things. Um, Russ Altman: mm-hmm.
Johan Ugander: Looking at the grant rates across commissioners. So, uh, there's a classic criticism of these types of, uh, judicial systems, which is understanding like, are, um, commissioners or judges, sort of like, are they all holding people to the same standard or how, um, how much is your release, uh, how much does it hinge on getting lucky with your commissioner? Russ Altman: Right, right.
Johan Ugander: Um, and when those types of questions are asked, it's actually, there's a typical in the parole setting, there's a strong non-randomness to how commissioners are assigned. So there's certain commissioners that tend to ha handle cases at sort of supermax level four prisons, and it's kind of obvious that they're gonna have lower grant rates even among lifers. Russ Altman: mm-hmm.
Johan Ugander: And so there's a classic response from sort of the prison side, which is like, well, or sorry, from the parole board side, which is, well, you expect heterogeneity because of the differences in the cases they're handling. So one of the things we do, because we have, cuz we're able to extract it from all these hearings, is we look at sort of like, let's look at, um, how do these grant rates stack up? Like what are the grant rates by the different commissioners?
And then what happens if we, this is sort of a computational trick. What happens if we, it's the fancy word is sort of randomization inference. Russ Altman: Mm-hmm.
Johan Ugander: What happens if we shuffle the assignments of commissioners to the cases within a prison within a year? Um, and that controls for the fact that the kid, the guidance has changed over time. Russ Altman: Right.
Johan Ugander: Uh, so sort of like if you were a commissioner who was more active in the past and then depending on which prison you're active in, and we find that there's a lot of commissioners that are far outside, sort of like what the, their grant rates are outside what you would expect if everybody was following sort of a common threshold. Russ Altman: Right. You have been listening to The Future of Everything podcast with Russ Altman.
I wanna remind you that The Future of Everything started out as a radio show on SiriusXM. So you'll hear references to that. Now it is a hundred percent podcast, but we still have access to the great shows that we taped with SiriusXM.
There are more than 215 of them, and they cover an extraordinary range of topics. If you're enjoying the podcast, please consider subscribing or following so that you can be alerted to every new episode and never be surprised by the future. Maybe tell your friends about it too.
Definitely consider rating and reviewing it. That helps us grow, improve, and also spreads the word. You can connect with me on Twitter @RBAltman and with Stanford Engineering @StanfordENG.