Carnegie Mellon Continues its Research on "Hostility-Diffusing, Peace-Seeking Hope Speech"
- By John K. Waters
Researchers at Carnegie Mellon's Language Technologies Institute (LTI) have published a new report on their ongoing work on uses of artificial intelligence (AI) to analyze online comments in social media and pick out those that defend or are sympathetic to disenfranchised groups.
The new report, "Harnessing Code Switching to Transcend the Linguistic Barrier," focuses on a common phenomenon in social-media content generated by a linguistically diverse user-base, such as on the Indian sub-continent, also called "code mixing."
Typically viewed as an impediment to downstream analyses in the current literature on Natural Language Processing (NLP), code switching becomes a force for good in this paper. It is a novel proposition, to be sure, but not to the paper's authors, Ashiqur R. KhudaBukhsh, Shriphani Palakodety, and Jaime G. Carbonell, who published their research in this area earlier this year. In this report, they document how they utilized code switching as "a bridge between a resource-rich and a low-resource language to reduce annotation efforts in the latter, while leveraging resources tailored to the former."
"Our approach is appealing for its minimal supervision requirements," they wrote.
In the context of hostility diffusing hope speech comments, our methods can be used to broaden the reach of such content overcoming the varied language skills of linguistically diverse regions and transcending language barriers… Our method holds significant promise in addressing resource gaps across widely used languages…"
The LTI researchers published their paper, "Voice For the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas," in January. (It's currently available online.) They focused their initial efforts on finding supporting content about the Rohingya people, who began fleeing Myanmar in 2017 to avoid ethnic cleansing. Left to themselves, the Rohingya are largely defenseless against online hate speech, one of the report's authors, Ashiqur R. KhudaBukhsh, a post-doctoral researcher in the LTI, explained at the time. Many of them have limited proficiency in languages such as English, and they have little access to the Internet. And "most are too busy trying to stay alive to spend much time posting their own content."
The researchers used their system to sort out the helpful from the hateful in hundreds of thousands of comments on social media, easily outpacing the work of typical human social media moderators.
I spoke with KhudaBukhsh when that first report was published, and he reached out via email about this new paper, which was set for presentation at the 29th International Joint Conference on Artificial Intelligence (IJCAI).
"Code switching, seamless alternation of multiple languages within the same document boundary, is a linguistic phenomenon highly common in linguistically diverse regions," KhudaBukhsh told Pure AI in an email. "While this phenomenon has been researched by linguists for 50 years and computational linguists for several decades, typically, code switching is looked at as an impediment to NLP. In this work, we show for the first time that code switching can be harnessed for social good, and we can perform cross-lingual sampling of hope speech in Hindi. Additionally, we show how to detect comments encouraging compliance to COVID-19 health guidelines in low resource language."
This is fascinating research, and KhudaBukhsh and company appear to be on to significant re-thinking of how we deal with online hate speech.
About the Author
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at firstname.lastname@example.org.