Online content moderation

Investigators: Savvas Zannettou in cooperation with Shagun Jhaver (Rutgers University, USA), Jeremy Blackburn (Binghamton University, USA), Emiliano De Cristofaro (University College United Kingdom), Gianluca Stringhini (Boston University), Robert West (EPFL, Switzerland), Krishna P. Gummadi (MPI-SWS, Germany)

Analyzing content moderation online is important for several reasons. First, social media and other online platforms have become major sources of information and communication, shaping public discourse and opinions. The content shared on these platforms can have signiﬁcant consequences for individuals, communities, and society as a whole. Thus, understanding the challenges and complexities of moderating online content is crucial for ensuring that these platforms are safe and inclusive spaces for all users. Second, content moderation involves a range of technical, social, and ethical issues that require interdisciplinary expertise. Studying content moderation online involves understanding the technical mechanisms used to identify and remove harmful content, as well as the social and cultural contexts in which these mechanisms operate. Moreover, content moderation online is a constantly evolving ﬁeld, as new technologies and social dynamics emerge. In this line of work, our goal is to analyze and understand multiple aspects of content moderation, including how soft moderation interventions (i.e., warning labels are applied online) and eﬀective they are [4, 2], what happens after online platforms take moderation action on speciﬁc online communities (i.e., community bans) [1], and how we can design systems to automatically identify accounts that are state-sponsored trolls and are involved in misinformative campaigns online [3].

Soft Moderation Interventions
Over the past few years, there is a heated debate and serious public concerns regarding online content moderation, censorship, and the principle of free speech on the Web. To ease these concerns, social media platforms like Twitter, Facebook, and TikTok reﬁned their content moderation systems to support soft moderation interventions. Soft moderation interventions refer to warning labels attached to potentially questionable or harmful content to inform other users about the content and its nature while the content remains accessible, hence alleviating concerns related to censorship and free speech. In our work, we performed one of the ﬁrst empirical studies on how soft moderation interventions are applied on Twitter [4] and TikTok [2]. In particular, our work for Twitter, uses a mixed-methods approach, to study the users who share tweets with warning labels on Twitter and their political leaning, the engagement that these tweets receive, and how users interact with tweets that have warning labels. Among other things, we ﬁnd that 72% of the tweets with warning labels are shared by Republicans, while only 11% are shared by Democrats. By analyzing content engagement, we ﬁnd that tweets with warning labels had more engagement compared to tweets without warning labels. Also, we qualitatively analyze how users interact with content that has warning labels ﬁnding that the most popular interactions are related to further debunking false claims, mocking the author or content of the disputed tweet, and further reinforcing or resharing false claims. Finally, we describe concrete examples of inconsistencies, such as warning labels that are incorrectly added or warning labels that are not added to tweets despite sharing questionable and potentially harmful information. Our work on TikTok [2], focuses on the important problem of soft moderation interventions during important health-related events like the COVID-19 pandemic. In particular, we analyze the use of warning labels on TikTok, focusing on COVID-19 videos. First, we construct a set of 26 COVID-19-related hashtags, and then we collect 41K videos that include those hashtags in their description. Second, we perform a quantitative analysis on the entire dataset to understand the use of warning labels on TikTok. Then, we perform an in-depth qualitative study, using thematic analysis, on 222 COVID-19-related videos to assess the content and the connection between the content and the warning labels. Our analysis shows that TikTok broadly applies warning labels on TikTok videos, likely based on hashtags included in the description (e.g., 99% of the videos that contain #coronavirus have warning labels). More worrying is the addition of COVID-19 warning labels on videos where their actual content is not related to COVID-19 (23% of the cases in a sample of 143 English videos that are not related to COVID-19). Finally, our qualitative analysis of a sample of 222 videos shows that 7.7% of the videos share misinformation/harmful content and do not include warning labels, 37.3% share benign information and include warning labels, and 35% of the videos that share misinformation/harmful content (and need a warning label) are made for fun. Our study demonstrates the need to develop more accurate and precise soft moderation systems, especially on a platform like TikTok which is extremely popular among people of younger ages.

Community Bans
When toxic online communities on mainstream platforms face moderation measures, such as bans, they may migrate to other platforms with laxer policies or set up their own dedicated websites. Previous work suggests that within mainstream platforms, community-level moderation is eﬀective in mitigating the harm caused by the moderated communities. It is, however, unclear whether these results also hold when considering the broader Web ecosystem. Do toxic communities continue to grow in terms of their user base and activity on the new platforms? Do their members become more toxic and ideologically radicalized? In our work [1], we report the results of a large-scale observational study of how problematic online communities progress following community-level moderation measures. We analyze data from r/The_Donald and r/Incels, two communities that were banned from Reddit and subsequently migrated to their own standalone websites. Our results suggest that, in both cases, moderation measures signiﬁcantly decreased posting activity on the new platform, reducing the number of posts, active users, and newcomers. In spite of that, users in one of the studied communities (r/The_Donald) showed increases in signals associated with toxicity and radicalization, which justiﬁes concerns that the reduction in activity may come at the expense of a more toxic and radical community. Overall, our results paint a nuanced portrait of the consequences of community-level moderation and can inform their design and deployment.

Detecting State-Sponsored trolls
Growing evidence points to recurring inﬂuence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In our work [3], we present TROLLMAGNIFIER, a detection system for troll accounts. Our key observation, based on analysis of known Russian-sponsored troll accounts identiﬁed by Reddit, is that they show loose coordination, often interacting with each other to further speciﬁc narratives. Therefore, troll accounts controlled by the same actor often show similarities that can be leveraged for detection. TROLLMAGNIFIER learns the typical behavior of known troll accounts and identiﬁes more that behave similarly. We train TROLLMAGNIFIER on a set of 335 known troll accounts and run it on a large dataset of Reddit accounts. Our system identiﬁes 1,248 potential troll accounts; we then provide a multi-faceted analysis to corroborate the correctness of our classiﬁcation. In particular, 66% of the detected accounts show signs of being instrumented by malicious actors (e.g., they were created on the same exact day as a known troll, they have since been suspended by Reddit, etc.). They also discuss similar topics as the known troll accounts and exhibit temporal synchronization in their activity. Overall, we show that by using TROLLMAGNIFIER, one can grow the initial knowledge of potential trolls provided by Reddit by over 300%. We argue that our system can be used for identifying and moderating content originating from state-sponsored accounts that aim to perform inﬂuence campaigns on social media platforms.

References
• [1] M. Horta Ribeiro, S. Jhaver, S. Zannettou, J. Blackburn, E. De Cristofaro, G. Stringhini, and R. West. Do Platform Migrations Compromise Content Moderation? Evidence from r/The_Donald and r/Incels, 2020. arXiv: 2010.10397.
• [2] C. Ling, K. Gummadi, and S. Zannettou. “Learn the Facts About COVID-19”: Analyzing the Use of Warning Labels on TikTok Videos, 2022. arXiv: 2201.07726.
• [3] M. H. Saeed, S. Ali, J. Blackburn, E. De Cristofaro, S. Zannettou, and G. Stringhini. TROLLMAGNIFIER: Detecting state-sponsored troll accounts on Reddit. In 43rd IEEE Symposium on Security and Privacy (SP 2022), San Francisco, CA, USA, 2022, pp. 2161–2175. IEEE.
• [4] S. Zannettou. “I won the election!”: An empirical analysis of soft moderation interventions on Twitter. In Proceedings of the Fifteenth International Conference on Web and Social Media (ICWSM 2021), Atlanta, GA, USA, 2021, pp. 865–876. AAAI.

Characterizing Information Propagation on Mainstream and Fringe Communities on Telegram

Investigators: Mohamad Hoseini, and Anja Feldmann in cooperation with Savvas Zannettou (Delft University of Technology), Philipe Melo (Federal University of Minas Gerais), and Fabrício Benevenuto (Federal University of Minas Gerais)

Online messaging platforms and social media are one of the main modes of communication and bring together millions of people. With well-connected networks capable of reaching thousands of users very quickly, this ecosystem becomes vulnerable to the impacts of misinformation,
and fake news campaigns in diﬀerent aspects. Traditional networks such as Facebook, Twitter, and Instagram concentrate a large part of online content, but alternative platforms have recently emerged and become popular, mainly due to the proposal of free and unmoderated discussion as in mainstream networks. Currently, Telegram is one of the leading messaging platforms hosting fringe communities. Despite the popularity, as a research community, we lack knowledge of how content spreads over this network. Motivated by the importance and impact of messaging platforms on society, we aim to measure the information propagation within the Telegram network, focusing on how public groups and channels exchange messages.

We collect and analyze over 130M messages from 9K channels and groups on Telegram. We analyze message forwarding and the lifetime of the messages from 5 diﬀerent aspects. Among other things, we ﬁnd inequality in content creation; 6% of the users are responsible for 90% of forwarded messages. We also ﬁnd that channels mainly play the role of content producers, while the groups act as consumers of forwarded messages. Additionally, we ﬁnd that 5% of the channels are responsible for 40% of the forwarded messages in the entire dataset. Finally,
our lifetime analysis shows that the messages shared in the groups with several active users live longer within the Telegram platform than the messages in the channels.