Name: Combating Misinformation Online Using Artificial Intelligence
Duration: 44 min 55 s
Channel: caltech
Description: Good morning, and thank you for joining us. I'm Robert Perkins, a science writer and press officer at Caltech. It's my pleasure to welcome you to Conversations on Artificial Intelligence: Combati...

Good morning, and thank you for joining us. I'm Robert Perkins, a science writer and press officer at Caltech. It's my pleasure to welcome you to Conversations on Artificial Intelligence: Combating misinformation online.

This series is hosted by the Caltech Science Exchange and gives us a chance to hear from scientists and engineers who are working on the cutting edge of AI and machine learning. I'm joined today by computer scientist, Anima Anandkumar, co-director of the AI for Science Initiative at Caltech, and election scientist, Michael Alvarez, co-director of the Caltech MIT Voting Technology Project. These two researchers have teamed up to tackle a looming challenge to democracy and our increasingly internet connected lives, misinformation, disinformation, and harassment online.

With the 2022 midterm elections upon us, it's more important than ever to address questions like how do you know what's real online anymore? Whether intentional or accidental, fake news spreads like wildfire through social media platforms these days. Alvarez and Anandkumar have tapped into the power of machine learning to uncover the spread of disinformation on Twitter, Facebook, and other platforms.

They have also extended their research to identify, prevent online harassment in an effort to create a safer and more trustworthy social media ecosystem. Anima and Michael, thanks for being here. It's our pleasure.

Thank you, Robert. Thank you. Let's start off by just explaining a little bit about AI.

Could you give us a brief definition of what AI is? And could you each talk just a little bit about how it intersects with your own research? Let's start with you, Professor Anandkumar.

Yeah, certainly. Intelligence is the ability to learn and adapt to different environmental changes. And we know about biological intelligence; we ourselves are the epitome of that.

But how do we now create that in machines, the ability to learn from data and then ultimately to make decisions and adapt to changing environments? I work on various aspects of AI, how to do self-supervised learning, meaning without asking humans for labels, can you figure out patterns from data itself and learn in a self-supervised or unsupervised manner? The ability to adapt to different environments and learn online; that's called reinforcement learning.

And ultimately, I'm interested in using AI for a variety of domains in sciences and social sciences like we see here today. Some of my research initiatives have been geared towards sustainability, how to tackle climate change, how we can model scientific phenomena at various scales, whether it comes to capturing carbon dioxide and sequestering deep underground or forecasting extreme weather events accurately. And so a range of these scientific problems require designing new AI methods.

Thank you. Dr Alvarez. Well, in my research group, we focus more on using the tools.

And the kinds of tools that are being developed by Anima's group are the kinds of tools that we're applying in lots of exciting and interesting ways in social science. Today, it's really pretty remarkable to be a social scientist because of the vast quantities of data that we have available to us. The size and scope and complexity of those data sets require that we use these new tools that are coming from AI and machine learning.

And we're just really excited to be at a place like Caltech where I have colleagues who can help teach us how to use these tools and maybe help them develop them and also to have students who, quite frankly, are just some of the most amazing students to work with who are really interested in learning and applying these tools to solve real world problems. Excellent. How did you both become aware of the issue of disinformation and misinformation online?

And what led you to want want to apply AI tools to study and address it? It's hard to avoid misinformation. You don't necessarily even have to be on social media, but you see the impact of that misinformation in a variety of settings, especially surrounding COVID-19, which is some of the projects that Mike's group studies.

And we're incorporating AI to help them do that. And how I got started on this research is by connecting with Mike. When I was exploring joining Caltech in 2017, I met up with Mike and was really fascinated at the range of really complex issues he's tackling.

And I felt like AI can really enable the research very effectively. And that's also the power of Caltech. I know its small size enables people from very different disciplines to come together and students who have a strong foundation, both in sciences as well as social sciences, can make a really unique impact here.

What have you learned so far about how misinformation spreads? Well, I'll take a stab at that, and then I'll pass it over to my colleague. Sure.

We've been working on, I would say multiple fronts. And a lot of the projects that we work on so far have really focused on trolling and harassment, focused on very targeted attacks on individuals. And so a lot of the work that we've really been focused largely on in the last couple of years has really tried to identify those types of attacks using large social media data, usually at this point in time, Twitter data, and to try to figure out how we can watch and monitor these kinds of conversations and this kind of harassment as it evolves and changes over time because the harassers, they know the platforms are monitoring them, they know that people like us are also trying to follow them and track them and they're changing their behavior, so their behavior is evolving.

And that is a very difficult problem to try to resolve. And again, we've done some research on that area. We've been looking in other areas, too.

We've been collecting Twitter data on tweets about election 2020. We've done a fair bit of work about some of the information and misinformation that was being spread then. We've tried to look at things like did Twitter's attempts to label and block some of the misinformation in 2020, were those attempts successful?

And I'll have to just tell you that, like other researchers, we found mixed evidence. And so the platforms are clearly having trouble policing the spread of misinformation. And I'll stop there because I could go on for the next 60 minutes and talk about these things, but I'll pass it over to my colleague.

Yeah. Now, Mike is the expert here on this. Yeah, in my view, the core of it is the algorithmic amplification, the automated amplification at such a large scale combined with these targeted attacks and a lot of them bots, not having the ability to authenticate who are real users.

The ability is there, but it doesn't exist in these platforms. And that also means you're just making these attacks bigger in a very short amount of time. I think a lot of that, the core is algorithmic amplification, in my view.

Excellent. Okay. Professor Alvarez, you study elections.

What impact does social media have on elections? Well, at this point in time, social media is the place where most people get their information about elections, in particular younger people. And so the information that's being spread on social media, whether it's fake or not, is really critical.

The candidates themselves have also turned to using digital advertising in just really massive ways in the last couple of election cycles, so they're on these platforms as well. And it ranges from the things that people of my generation are familiar with like Facebook and Twitter, but to these newer forms of social platforms and content like TikTok. Again, there's all sorts of information that's flowing on these platforms.

It flows very quickly, unlike in the past when like when I first started participating in politics myself, when things were pretty much on the paper or in the evening news. And these platforms give, with their rapid spread of information and the lack of real fact checking, it really gives us this problem where people are actually susceptible to seeing lots of mis and misleading information. And it's becoming, I think, one of the biggest problems that we're facing in today's world, which is how do we fact check a lot of the information that's being spread on social media?

Well, and whose job is it to fact check it? Well, ideally we'd like to be able to do it. But again, as all, and Anima could talk about more, that's a very difficult problem.

We've talked a fair bit about how to try to do that, but technically speaking, it's a very difficult problem. And I don't know if you want to speak to that, Anima. Yeah.

One aspect is, especially in a fast developing topic, what are even the right facts? First, there has to be an authoritative source on somebody claiming certain facts. And that takes some time to establish what they are, especially let's say the vaccines are getting developed, there are scientific studies, but then there's also some very prominent personalities who are speaking against that, so then who is considered authoritative in those scenarios?

That's tricky for the platform to figure out, but that's why I think having the experts, especially on these scientific topics, a group of experts, what is the opinion of them collectively? And then establishing those as the facts and anything that is hugely deviating from that as misinformation. And ultimately, once we arrive at that, then how do we use AI to detect misinformation?

And how do we detect bad actors? Comes the second point. But even the first point is sometimes takes time.

Ultimately, I believe we can come up with what are the facts in almost every scenario. But for platforms to be able to react quickly is a challenge. Okay, so it sounds like a problem of scale and a problem of speed.

Yes. Okay. Let's see.

What's at stake here? Can you give us a sense of the scope of the problem and what'll happen if we just don't address it, if we just let it be as it is right now? Well, that's a big question.

The problem ranges, again, from real deliberate targeted attacks at individuals. Recent polling from the Peer Research Center finds that four out of 10 Americans report some form of online harassment and bullying. That means that of the people who are watching this right now, that many of them have experienced this themselves.

They've experienced the bullying and the online harassment. Again, that's one piece of the problem. But then what we were just talking about, this larger puzzle, which is trying to police the misinformation and/or misleading information, it may not be factually incorrect, but it's still may be misleading, is a problem at a large scale, as Anima just talked about, that is very difficult.

What's at stake is, on the one hand, we need to stop this harassment online. People should not be harassed online, and that just has to stop. In the macro scenario, we need to be very careful that our democracy isn't eroded because of the spread of either misleading information or factually incorrect information in this election and in particular as we move forward into the next presidential election.

Does misinformation represent an existential threat to democracy? Anima's shaking her head yes. Yeah, I think so.

I think that that's part of why we're motivated to really put a lot of time into this project. And I think that's motivating a lot of our students to get involved as well. That's great.

All right. And can you talk a little bit about the AI tools that you've been building and how they work? What can AI do to help address this issue?

Yeah, I think AI gives us excellent area of tools to do this research at large scale. We've been able, Mike's group, to now analyze billions of tweets very quickly that wouldn't be possible before. One challenge here, compared to standard AI frameworks, is that this is not a fixed language or fixed set of topics.

On social media, we have new topics of evolve all the time. COVID-19 wasn't even in our vocabulary maybe when the models were initially trained, and now they have to adapt to this. An.

And also, especially when it comes to harassment, trolling and bad actors, they are coming up with new keywords, new hashtags all the time. It's may not even be the right English words, it's suddenly all new vocabulary. How do we handle this constantly evolving vocabulary and topics?

My research that dates back more than a decade has been to develop these unsupervised learning tools, what we call topic modeling. Here, we are automatically discovering topics and data by looking at how the patterns evolve. Think of how if many words co-occur together, then they should have some strong relationship in representing a topic.

Think of the words apple and orange. If they're occurring together, likely the topic is fruit. We can say that just intuitively.

And so what these methods, what we call tensor methods too, is look at such conquering relationships. If multiple words, groups of words are occurring together, what kind of topics does it represent? And then it has to do this over vocabulary of hundreds of thousands of possible words over documents of billions or even more sizes.

You have to do this at a large scale and app developed methods that can enable us to extract topics automatically. And so we call this tensor methods because we are looking at these cocurrents of three or more words together in documents, in different say tweets, for instance, and how we can extract the underlying relationships by looking at these cocurrences. Fascinating.

Can I add onto that? Please, please. Because I really want to make sure that we recognize the incredible importance of Anima's contribution here.

It's not just that we can do this, it's that with her contribution in this area, we can do this at scale very quickly. The techniques that she's been developing, and some of the students in my group have been working with people from her group to implement, it means that we can take. .

. We have a data set of all of the tweets about Me Too. It's about 9 million tweets.

We can analyze that data set in a matter of minutes, whereas before it would take many days, maybe a month. And so with these techniques that she's talking about this, we can do things in the computational social sciences that we could never do before. And it's just a really remarkable revolution in our ability to analyze these large data sets quickly so that we can actually try to make actionable decisions and use these data in actionable ways.

And I really want to emphasize just how important her contribution has been in this area. Thank you. Thank you, Mike.

And I think that's the aspect that is very important, having first of all, computational tools such as tensor methods. These can run very fast on GPUs, or graphic processing units, that drives almost all of AI today. And so we can parallelize and we can do this at scale.

And at the same time, they have an underlying theory saying that you can discover the topics faithfully in so many interesting scenarios. And I think that's what the power of unsupervised learning is. It doesn't rely on humans going and labeling different topics, and so you can do this online as the topics emerge instead of waiting for human annotators to collect data and discover new topics and then train AI.

Fascinating. All right. And I think you mentioning scale brings me into our next question.

What would it take to make this effective in the real world? What will make this a tool that we can deploy right now? We are already deploying.

I think we are in the process of open sourcing and releasing this very soon. And I do think this can have a big impact in social sciences, among many other domains. Mike.

Yep. No, I agree. Part of what happens in the academic world is that we get a little distracted because we like to publish things.

And so instead of immediately pushing a lot of these tools into some sort of production use, we like to step back and have our colleagues take a look at them and go get these projects to have them go through peer review. And so again, we do have a lot of things we've been working on. We've been doing this project for now maybe, what, about three years?

And so we have a lot of interesting things that are now in peer review. And as those start to push through that pipeline, we're going to start releasing a lot of this code and software and start to work on trying to make these into deployable tools that actually can be used in lots of different domains. Excellent.

I'd turn to the issue of online harassment and trolling again, if that's all right. Can you talk about how big of a problem this is online? Well, I think I mentioned one little factoid, which is that when you ask Americans have they suffered from different forms of online harassment and bullying, four out of 10 say yes.

That just gives you some scope and scale. And I can say that I have myself suffered from various types of harassment online, in particular during the last election cycle because of a lot of the work that we were doing. I was getting some very nasty emails myself.

We all suffer from it. And I think some people suffer from it in really horrific and problematic ways. And that has to stop.

Again, like I said before, and as researchers, part of the reason for our motivation in this project is because we've suffered from it ourselves. And we know that we have to do things to try to police these platforms because, to be honest, the social media platforms, they try, maybe, they do some things, maybe, but clearly, they're not doing enough. And to Mike, I've suffered a lot of harassment as well online.

And women in marginalized communities tend to suffer more online. And I think this is a huge aspect, especially if you think about teenage girls and the impact that social media is having, and along with it bullying and harassment even leading to suicides or really adverse effects on their mental wellbeing. This is really a serious problem.

And I do think in our offline communities, we found so many ways to tackle harass. I feel like social media sometimes takes away the humanity in us because most people may not be saying that to you to your face if they're talking to you in person. But I think the shield of online social media where you can pretend to be somebody else and you don't expect any impact, either on your life by saying all these things I think makes people bolder to do that.

And also not also understanding the consequences because they don't interact with others in a more organic manner. I think that's one side of it, how do we improve just people interacting in general better? But the other aspect that Mike talks about that is targeted and having bots and having very planned attacks on people, that I think the current social media tools should already be preventing.

I hope more can be done about that. Excellent. Okay.

Let's turn to some potential concerns about this new technology. Is there any worry that these efforts, and you said they're about to be released, it could be used to curtail free speech, could be misused in some way by say an authoritarian government or something like that? Our tools are open source, and they're really all about helping you understand different topic relationships in the evolving text.

And so it's then up to the moderators to decide if this is something that needs to be addressed. And so it's really helping people who are setting policies to handle the scale and the speed of data that is coming at them. It's definitely not up to us to police or frame those questions.

We are enabling researchers as well as social media companies to handle the scale and speed of data. And the other aspect to me personally, I think, algorithmic amplification is not about free speech. We are not curtailing anybody's speech if we are saying how we should limit the amplification of misinformation.

Somebody can shout at their rooftop anything they want; they're free to do that. But if you have to say that that has to reach millions of people because people tend to believe what is being engaged at such a massive scale, that's a different story. I think we should separate free speech from algorithmic amplification.

Interesting. Dr Alvarez. I think Anima really answered the question quite well in that, as scientists, our responsibility, in particular once we publish things and make claims about techniques and methods and quantitative findings, we have a responsibility to share the materials that we've used to develop those claims.

Our responsibility, first and foremost, is to the scientific community, to make sure that our colleagues can use these tools and verify that they work as we claim they do. And so that's, I think, the first part of our responsibility. But we do really hope that these tools are ones that are used by the social media platforms.

We are very optimistic. We know that they have very large data science teams. We know that they follow what we do, and so we're very optimistic that, as we release these tools and a lot of our applications, that they're going to take a look at them and pick them up and perhaps use them to try to mitigate if not resolve a lot of the problems that currently exist.

Okay. And related to that, and this gets back to something you mentioned earlier, Dr Anandkumar, what about the question of who gets to decide what the truth is and what is misinformation? You mentioned getting together, say, a list of experts or a group of experts in saying, "Okay, what are they saying about this problem?

" Who gets to select that group of experts? And how do we manage that once these tools are fully functional? Yeah, at the moment, the answer is nobody is doing that, right?

Right. True. And ultimately, it is the social media companies who will decide what goes on their platform, what gets amplified.

That's, I think, the aspect of we need to have that openness in how they're doing it, and that's still lacking. I know there is the oversight board for Meta, or Facebook, right? Mm-hmm.

And I've engaged with many of the members there earlier. And yeah, there is still a big imbalance in terms of the amount of data sciences and resources that Facebook has compared to, say, what the oversight board does. But still, I think that's a really good addition in terms of having an outside community to monitor and understand how the decisions are being made and engage with Facebook to come up with policies that are more consistent.

I hope more of that can be done. Excellent. Have you heard from anybody in the industry who's familiar with your research that this is something they're excited to start working with?

Are folks from Twitter and Google reaching out and saying, "God, you got to get this. We need this technology"? I think Anima was just talking about some of those contacts.

And the answer is yes. We've had conversations with various social media companies. There are quite a few people who go through either our program here at Caltech, the undergraduates, or our PhD programs and others who we know who go to work in these companies, and so they know us and that they're following our work.

And we don't really have a good sense for necessarily exactly what they do, but we do know that they follow us and that we hope that they are taking advantage of the science and the knowledge that we're generating. And as we develop things that we think are particularly pertinent or relevant to, say, Twitter or Facebook or YouTube or any of these other platforms, we'll do our best to get in touch with them and let them know what we're working on. Yeah.

And I also want to add Activision Blizzard. It's not a social media company, but a gaming platform. We are about to announce a partnership with them where they're supporting our research to develop tools to help combat harassment in gaming platforms.

And I think having them incorporate many of the tools we develop will greatly help address harassment in gaming. Thank you. All right.

That's fascinating. What I'd like to do now is turn to a few questions that we've received from the audience. First question comes to us from Megan, and she asks, "How will AI keep up with highly variable social morays over time?

" Yeah. The idea is not for AI to make decisions about how we deal with amplification. There has to be human in the loop.

And I think that's one of the problems today; it's automated amplification. And so if of the tools we are developing are unsupervised, they can track changing conversations different, entirely new vocabulary also across different languages, which is a huge problem right now because, as much as we try to moderate what's happening in English, there are so many other languages where there have been conflicts and all kinds of even threats on democracy and elections happening in other languages, which are a huge problem. That's what these unsupervised learning tools give you.

They're blind to those details because they look at the patterns in data itself as they're evolving and bubbling up to human moderators, and so we need to have human in the loop. And there needs to be consistent policies to guide these humans in terms of how do we judge if something is harassment? I think there is a huge amount of research done.

And what do you deem as harassment? And I think one aspect is also harassment, it shouldn't be judged based on one tweet. You should look at the whole historical context in terms of how people are engaging, say, with that particular person or which groups of them are engaging.

And I think that's also missing right now because whatever moderation is done is at the level of one tweet, which may not seem that bad, but if you see the global context, you may see a systemic issue of continued harassment. I think, to me, the short answer is, yes, we have to be adaptable to different social mores and changing conditions. And if we give agile AI tools that enable human moderators to adapt, I think that's the best we can do.

And I'll say that there are other people who study this at Caltech. For example, Frederick Everhart, who is a Caltech faculty member in my division, HSS, who is a philosopher. And he teaches a wonderful class here at Caltech on the ethics of AI.

And so while we're not directly collaborating with Frederick at this point, I know we both have had many conversations with him over the years. And I think that as our research continues to grow and progress, that I'm pretty confident that we'll be reaching out to scholars like Frederick here at Caltech and elsewhere to really start to incorporate more about the underlying ethics of use of AI and how AI's responding as social morays about things like harassment are changing. All right.

I've got one from an anonymous individual who asks, "When and where should we be on the lookout for your tools to become open source? " Will it be a branded product coming out that we should look for? Where will we find it?

Yeah, we are going to release this as part of Opensource library called tensorly, tensorly. org. You can look up there as well as our websites.

Mike also maintains a blog on trustworthy social media and as well as we'll hopefully announce it on Twitter and other social media. Fantastic. All righty.

I have a question from Hall Daily for Professor Alvarez. He asks, "What are the two or three. .

. Or what are two or three most persuasive facts regarding the integrity of federal elections? " Well Hall, great to hear from you.

And we've missed you at Caltech. And I'm glad you're participating with us today. The integrity of American elections takes us a little bit off topic in one sense, but I will tell you a few things.

The first is that our research that we've been doing at Caltech that is actually partly connected into the trustworthy social media project has been finding consistently that American voters should be confident in our electoral system. I'd say that's one. The second is that, while our electoral system in the United States and our democracy's not perfect, it functions pretty well.

And there isn't any evidence of any systematic fraud or manipulation of the election. I would say that that's factoid two. And factoid three, which actually does come back into some of the work that we've done, is that there are thousands and thousands of people who are working every day very hard to run our elections.

And by and large, these people do a great service to democracy. And they have come under attack in recent elections. And that's actually part of some of the work that we have been doing, which is to develop some tools to look at those attacks and those threats that have been aimed directly at the men and women in America who are running our elections and to see if we can, again, use these same tools and techniques that we've been talking about in near real time to try to detect those threats and threats of violence and threats of physical harm that are aimed at those people who are running our democracy so that perhaps they can be prevented.

I would say those are the three main factoids that I'd say in response to your question, Hall. Fantastic. This one might get to a little bit of what you were talking about, Dr Anandkumar, with the apples and the oranges, but Gary Heard asks, "Can the context of a post in a list be analyzed?

I find that Facebook AI merely has a trigger list for single comments. " Yeah, you can certainly have, say, a list of known harassment or words that are problematic. And we can incorporate all that.

We should use all the existing labels and knowledge that we know are problematic, either in terms of harassment or misinformation. What these tools enable you is, on top of that, to discover new topics and evolving vocabulary. At scale, we would be doing a combination.

We would see the current train, say toxicity filters, what are they showing up? You do see, especially the common list of problematic words, those are automatically hidden on Twitter and even Facebook these days. You still have the option to open and see them, or you can even set in your settings if you want that to be shown to you or even completely muted out.

There are more tools than before, which is great. And we should use all that as the first step. But then those that escape the current filters, how do we use these unsupervised learning tools, the topic detection tools, to bubble up very quickly to human moderators so we can keep up with this cat and mouse game with harassers and those spreading misinformation?

Andrew Moseman asks, "Can you talk a little bit about the difference between trying to track and. . .

or trying to monitor different social networks for misinformation? For example, is it more difficult to track TikTok, which is video-based, versus Twitter or Facebook, which is more text-based? " I'll leave that one for Anima.

That's a hard one. All these are developing fast. We do have models that can not only understand videos now, but also generate videos.

We are getting to, really, an impressive scale of being able to handle videos. But I would say the research to use that to detect misinformation is still very preliminary, and that's something we want to pursue further. But yeah, I would say in general, it's the scale of it that overwhelms even more.

And the other is the ability to collect data at that scale, to do such research. And that's the piece that I'll speak to, which is that part of the reason why we focus so heavily on Twitter is because Twitter actually provides academics like us with relatively easy access to their data. Twitter is a very transparent platform currently, and that's part of the reason why we focus so much of our energies there.

Some of these other platforms are not, like Facebook. We do not have the same sort of ready access to streams of real time information from Facebook that we have from Twitter. I do believe that TikTok does make some of their information available for academic use.

And I know there are some research groups who have been trying to analyze the videos from TikTok, but as Anima's alluded to, that research in particular when it comes to misinformation and harassment is very nascent. That brings up the question then, does that impact with the final product of what you are creating? Will you wind up with a tool that is much better suited for Twitter because that's what it was developed using?

Or will it be universally effective for social media platforms like Facebook and others? I would say the topic modeling framework is general. You could even go beyond techs to other kinds of data.

In fact, people have used this for understanding genetic ad mixtures. We've used this for automatically learning about cell types from mixture data. The tools developed can be broad, and that's why our AI research, we want to open source these tools because they'll have impact in a lot of domains.

But that being said, if you want to do video data, we do have to use deep learning in some form to create the right embeddings. And on that space, we can use this topic detection tools to see if new topics arise. But yeah, that's an open research question, and an exciting one at that.

How effective will this be for video data? Anima already alluded to one way in which these tools will be used in other kinds of contexts, and that is with respect to the gaming platforms. And so again, part of the reason that the gaming companies are interested in collaborating with us on these kind of projects is that these tools are readily transferable to the problems that they're seeing.

Many of these video games involve text chat, and so we can go through and use these same techniques in a very transferrable way to analyze the text chat as these players are interacting in multiplayer games. There's applications like that that are just very, very straightforward for these technologies that have nothing to do, actually, with social media, but where they can be used in areas where we know that there's a great deal of toxicity and harassment. I see.

Returning to the issue of misinformation, an anonymous person asks, "Can you share a few top examples of misinformation that you've discovered and spotted? " Election 2020 is a great example. During election 2020, there were many claims that were made.

They weren't things that we necessarily discovered and revealed, but there were lots of claims being made throughout the country about election results being problematic or claims of election fraud, claims of all sorts of irregularities that, again, we weren't doing the necessary fact checking of many of those. I'd refer the anonymous comment to some of my colleagues in election administration who work up at Stanford University. They have a very good paper that was published in TNAS.

This is Justin Grimmer and some of his colleagues. It's an open access paper; you can go and find it. They went through and attempted to debunk many of these false claims about election 2020.

Okay. Eric John asked, "How long is the list of topics that your models discover? For example, in the data set of all the Me Too tweets, does someone have to go through manually all the topics and decide which are related to harassment and which aren't?

" We can decide how many topics we want. We can tell the algorithm, "Give me the top five topics in these set of tweets, or top hundred. " It's up to us to decide how granular we want to get and how many topics we want to get through.

But for these algorithms, the topics are represented by the dominant words that occur in those topics, so you see the representation in terms of words. And then it's up to I think the human moderators or others to decide what these topics mean and how we can make sense of the Twitter volume evolving with these topics. Okay.

I think we've got time for maybe just one more question. This gets to a little bit of what we talked about earlier, but how would we cover misinformation in languages other than English? Is that going to take the creation of a separate tool?

How does this work? And that's the beauty of this tool compared to supervised deep learning tools that we see a lot. With this tool, you just need to feed in all the data you have that can be tweets in another language, and it'll automatically figure out the relationships of how often are different words occurring together?

And so this is agnostic to the language, and I think that's really the power of this tool, that we can use this in rare languages where there isn't a lot of trained toxicity filters. Excellent. All right, well, Anima and Michael, thank you so much for being here.

And thanks to everybody for joining us. If you are interested in learning more about AI, I'd encourage you to check out the Celtic Science Exchange. The site provides clear explanations on not just artificial intelligence, but also other topics in science and technology such as quantum physics and climate change.

You can find it at scienceexchange. caltech. edu.

Thank you. Thank you so much. Thanks.

Bye, everyone.