Online hate speech & conspiracy theories

Analyzing Antisemitism and Islamophobia using a Lexicon-based Approach

The spread of Antisemitic and Islamophobic content in a longstanding problem, in particular within fringe Web communities. In this work, we attempt to analyze the spread of Antisemitic and Islamophobic content on 4chan’s Politically Incorrect board (/pol/) using a lexicon-based approach. We use an openly-accessible knowledge graph, word embedding techniques that allow us to assess semantic similarity between terms, as well as manual annotations to create 2 lexicons. A lexicon of 48 Antisemitic terms and another lexicon of 135 Islamophobic terms. Then, by extracting all posts containing these terms from /pol/, we assess the popularity and veracity (i.e., what percentage of posts that contain these terms are actually Antisemitic/Islamophobic). We find that 93% and 81% of posts that contain terms from our lexicons are Antisemitic and Islamophobic, respectively. Also, we find that the veracity of usage and frequency of these terms greatly varies on 4chan’s /pol/. Finally, using topic modeling, we provide an overvview of how popular Antisemitic and Islamophobic terms are used on 4chan’s /pol/. To conclude, we make publicly available our lexicons for Antisemitic and Islamophobic terms, which are likely to be useful for researchers working on Antisemitism/Islamophobia or hate speech in general.

Reference: Proceedings of the 16th International AAAI Conference on Web and Social Media (2022) DOI: 10.36190/2022.61

A Quantitative Approach to Understanding Online Antisemitism (ICWSM 2020)

Investigators: Savvas Zannettou (TU Delft), Joel Finkelstein (Princeton University), Barry Bradlyn (UIUC), Jeremy Blackburn (Binghamton University)

In this work, we study online antisemitism on two fringe Web communities, namely, 4chan Politically Incorrect board (/pol/) and Gab. We propose a quantitative approach to understanding online antisemitism, which can also be applied to study other forms of hate speech on the Web. We leverage word embeddings and graph analysis techniques to visualize topics of discussions and automatically discover new slurs related to online antisemitism. Overall, alarmingly, we find a rise of antisemitic rhetoric and antisemitic memes over time in both 4chan’s /pol/ and Gab.

Reference: https://arxiv.org/abs/1809.01644

Go Eat a Bat, Chang!": An Early Look on the Emergence of Sinophobic Behavior on Web Communities in the Face of Covid-19

Investigators: Leonard Schild (CISPA), Chen Ling (Boston University), Jeremy Blackburn (Binghamton University), Gianluca Stringhini (Boston University), Yang Zhang (CISPA), Savvas Zannettou

The COVID-19 pandemic has changed our lives in an unprecedented way. During these challenging times, the Web is an indispensable medium, however, it can also be exploited for disseminating hateful content like the spread of Sinophobic content, since the virus is believed to originate from China. In this work, we investigate the spread of Sinophobic content on 4chan and Twitter and assess whether the COVID-19 pandemic leads to an insurgence of online Sinophobia. We find that, indeed, the COVID-19 pandemic caused an increase in the use of Sinophobic slurs on both 4chan and Twitter. Also, we find differences across Twitter and 4chan: on Twitter we observed a shift towards blaming China for the pandemic, while on 4chan we observed a shift towards using more and new Sinophobic slurs.

References:https://arxiv.org/abs/2004.04046

Measuring and Characterizing Hate Speech on News Websites

Investigators: Savvas Zannettou, Mai ElSherief (Georgia Institute of Technology), Elizabeth Belding (UCSB), Shirin Nilizadeh (University of Texas at Arlington), Gianluca Stringhini (Boston University)

In this work, we aim to quantify and measure the prevalence of hateful content on comments posted on news articles and whether the appearance of news articles on 4chan and Reddit, lead to changes in (hateful) commenting activity. By collecting a large corpus of comments from news articles and annotating them using Google’s Perspective API, we find that there is an increase in hateful commenting activity shortly after the news articles are posted on Reddit and 4chan.

Reference: https://arxiv.org/abs/2005.07926

Understanding and Detecting Hateful Content using Contrastive Learning

Investigators: Savvas Zannettou in cooperation with Felipe-Gonzalez Pizarro (University of British Columbia, Canada)

The spread of hate speech and hateful imagery on the Web is a signiﬁcant problem that needs to be mitigated to improve our Web experience. The problem of hateful content is longstanding on the Web for various reasons. First, there is no scientiﬁc consensus on what constitutes hateful content (i.e., no deﬁnition of what hate speech is). Second, the problem is complex since hateful content can spread across various modalities (e.g., text, images, videos, etc.), and we still lack automated techniques to detect hateful content with acceptable and generalizable performance. Third, we lack moderation tools to proactively prevent the spread of hateful content on the Web. Our work [1] focuses on assisting the
community in addressing the issue of the lack of tools to detect hateful content across multiple modalities. In particular, we contribute to research eﬀorts to detect and understand hateful content on the Web by undertaking a multimodal analysis of Antisemitism and Islamophobia on 4chan’s /pol/ using OpenAI’s CLIP. This large pre-trained model uses the Contrastive Learning paradigm. We devise a methodology to identify a set of Antisemitic and Islamophobic hateful textual phrases using Google’s Perspective API and manual annotations. Then, we use OpenAI’s CLIP to identify images that are highly similar to our Antisemitic/Islamophobic textual phrases. By running our methodology on a dataset that includes 66M posts and 5.8M images shared on 4chan’s /pol/ for 18 months, we detect 173K posts containing 21K Antisemitic/Islamophobic images and 246K posts that include 420 hateful phrases. Among other things, we ﬁnd that we can use OpenAI’s CLIP model to detect hateful content with an accuracy score of 0.81 (F1 score = 0.54). By comparing CLIP with two baselines proposed by the literature, we ﬁnd that CLIP outperforms them, in terms of accuracy, precision, and F1 score, in detecting Antisemitic/Islamophobic images. Also, we ﬁnd that Antisemitic/Islamophobic imagery is shared in a similar number of posts on 4chan’s /pol/ compared to Antisemitic/Islamophobic textual phrases, highlighting the need to design more tools for detecting hateful imagery. Finally, we make available (upon request) a dataset of 246K posts containing 420 Antisemitic/Islamophobic phrases and 21K likely Antisemitic/Islamophobic images (automatically detected by CLIP) that can assist researchers in further understanding Antisemitism and Islamophobia.

References
• [1] F. González-Pizarro and S. Zannettou. Understanding and Detecting Hateful Content using Contrastive Learning, 2022. arXiv: 2201.08387.

On the Globalization of the QAnon Conspiracy Theory Through Telegram

Investigators: Mohamad Hoseini, Savvas Zannettou, and Anja Feldmann in cooperation with Philipe Melo (Federal University of Minas Gerais), and Fabrício Benevenuto (Federal University of Minas Gerais)

QAnon is a far-right conspiracy theory that has implications in the real world, with supporters of the theory participating in real-world violent acts like the US capitol attack in 2021. At the same time, the QAnon theory started evolving into a global phenomenon by attracting followers across the globe and, in particular, in Europe, hence it is imperative to understand how QAnon has become a worldwide phenomenon and how this dissemination has been happening in the online space. This paper performs a large-scale data analysis of QAnon through Telegram by collecting 4.4M messages posted in 161 QAnon groups /channels. Using Google’s Perspective API, we analyze the toxicity of QAnon content across languages and over time. Also, using a BERT-based topic modeling approach, we analyze the QAnon discourse across multiple languages. Among other things, we ﬁnd that the German language is prevalent in our QAnon dataset, even overshadowing English after 2020. Also, we ﬁnd that content posted in German and Portuguese tens to be more toxic compared to English. Our topic modeling indicates that QAnon supporters discuss various topics of interest within far-right movements, including world politics, conspiracy theories, COVID-19, and the anti-vaccination movement. Taken all together, we perform the ﬁrst multilingual study on QAnonthrough Telegram and paint a nuanced overview of the globalization of QAnon [1, 2].

References
• [1] M. Hoseini, P. Melo, F. Benevenuto, A. Feldmann, and S. Zannettou. On the Globalization of the QAnon Conspiracy Theory Through Telegram, 2021. arXiv: 2105.13020.
• [2] M. Hoseini, P. Melo, F. Benevenuto, A. Feldmann, and S. Zannettou. On the globalization of the qanon conspiracy theory through telegram. In Proceedings of the 15th ACM Web Science Conference 2023, New York, NY, USA, 2023, WebSci ’23, p. 75–85. Association for Computing Machinery.