Carnegie Mellon Uses AI To Counter Hate Speech with 'Hope Speech'
- By John K. Waters
Researchers at Carnegie Mellon's Language Technologies Institute (LTI) say they have developed a system that uses AI to analyze online comments in social media and pick out those that defend or are sympathetic to disenfranchised groups.
It's still in development, but the LTI researchers say they've used the system to sort out the helpful from the hateful in hundreds of thousands of comments on social media, easily outpacing the work of typical human social media moderators.
The purpose of finding and highlighting these positive comments, rather than just filtering hostile content, was to employ a different strategy to make the Internet a safer, healthier place, said Ashiqur R. KhudaBukhsh, a post-doctoral researcher in the LTI, who conducted the research with alumnus Shriphani Palakodety.
"The current discourse moderation tools on social media platforms focus on minimizing hate speech through deletion of hostile content and flagging belligerent members," KhudaBukhsh told Pure AI. "What we suggest is that, along with those very important practices, the automatic identification of user-generated Web content that champions the cause of a group can be equally useful."
The ability to analyze large bodies of text for content and opinion is possible because of recent improvements in language models, said Jaime Carbonell, LTI director and a co-author on the study. These models learn from examples so they can predict what words are likely to occur in a given sequence and help machines understand what speakers and writers are trying to say.
The LTI researchers focused their initial efforts on finding supporting content about the Rohingya people, who began fleeing Myanmar in 2017 to avoid ethnic cleansing. Left to themselves, the Rohingya are largely defenseless against online hate speech, KhudaBukhsh explained. Many of them have limited proficiency in languages such as English, and they have little access to the Internet. And "most are too busy trying to stay alive to spend much time posting their own content."
"We chose the Rohingyas because they represent one of the biggest humanitarian crises happening in modern times," KhudaBukhsh said. "More than 700,000 people were rendered homeless in the last decade. And we didn't see much machine learning research happening on this crisis. I suspect the reason for that is the language separation is such a challenging task."
In fact, the LTI researchers faced multiple challenges -- and not just at the language level.
"Whenever you want to build a machine learning classifier, you need a substantial number of positive examples and negative examples," KhudaBukhsh explained. "And in this case, the positives were already so rare [that] it became a kind of circular problem: We needed enough positives to build a classifier to find positives. That was one of the major technical challenges."
For language identification, the researchers employed polyglot embedding with fastText, a text representation and classification library, and they developed an original strategy they call "active sampling," which uses the nearest neighbors in the comment-embedding space to construct a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral comments.
The LTI researchers applied their system to searches of more than a quarter of a million comments on YouTube in what they say is the first AI-focused analysis of the Rohingya refugee crisis.
They also used the technology to search for anti-war "hope speech" among nearly a million YouTube comments surrounding the February 2019 Pulwama terror attack in Kashmir, which enflamed the longstanding India-Pakistan dispute over the region.
"Our biggest hope is that our work will generate interest in the research community about this direction of amplifying the positives," KhudaBukhsh said. "And we also hope that the social media giants like YouTube, Twitter and Facebook, who are facing these moderation challenges, will pay attention to this work and consider the potential of hope or help speech."
KhudaBukhsh plan to present their research at the Association for the Advancement of Artificial Intelligence annual conference (Feb. 7-12) in New York City. They've published that research in a paper, "Voice For the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas," available online.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.