The Web consists of numerous Web communities, news sources, and services, which are often used by various actors for potentially nefarious purposes. At the same time, the overhead for creating new platforms and communities has significantly shrunk over the past years, hence the Web is becoming a bigger and more complex ecosystem (e.g., users can easily create new subreddits on Reddit). These communities/platforms do not exist in vacuum: users can have accounts on multiple platforms and share information from one platform to another. Due to this, it is imperative to obtain a cross-platform understanding on the Web, quantify the role and influence of emerging Web communities and platforms, and assess the fundamental differences across platforms/communities. In this line of research, we aim to study emerging platforms and Web communities that can have a large impact and influence both on the online and offline world and contribute to the research community by making publicly available datasets from social networks.
Demystifying the Messaging Platforms' Ecosystem Through the Lens of Twitter (IMC 2020)
Authors: Mohamad Hoseini, Philipe Melo, Manoel Junior, Fabricio Benevenuto (UFMG), Balakrishnan Chandrasekaran, Anja Feldmann, Savvas Zannettou
Short description: Messaging platforms like WhatsApp, Telegram, and Discord play an increasingly important role in addressing people’s communication needs. In this work, we aim to understand these messaging platforms by discovering publicly available groups on Twitter. We focus on understanding how these groups change in composition and activity over time and whether messaging platforms expose sensitive personal data. Among other things, we find that all messaging platforms expose some kind of sensitive data with exposures being more prevalent on WhatsApp compared to Telegram and Discord.
The Evolution of the Manosphere Across the Web (ICWSM 2021)
Authors: Manoel Horta Ribeiro (EPFL), Jeremy Blackburn (Binghamton University), Barry Bradlyn (UIUC), Emiliano De Cristofaro (UCL), Gianluca Stringhini (Boston University), Summer Long, Stephanie Greenberg (Binghamton University), Savvas Zannettou
Paper Link: https://arxiv.org/abs/2001.07600
In this work, we study the Manosphere, a conglomerate of Web communities, that focus on discussing men’s issues. We investigate how the Manosphere evolved over time by collecting a large-scale dataset from Reddit and several standalone forums. We find that milder communities, like Pick Up Artists, are overshadowed by newer communities like Incels, while at the same time we find that newer communities are more toxic and share more misogynistic views.
Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board (ICWSM 2020)
Authors: Antonis Papasavva (UCL), Savvas Zannettou, Emiliano De Cristofaro (UCL), Gianluca Stringhini (Boston University), Jeremy Blackburn (Binghamton University)
Short description: In this work, we made publicly available a large-scale dataset from 4chan’s Politically Incorrect board (/pol/). The dataset includes all posts made available on /pol/ during a period of 3.5 years. This dataset can be valuable to researchers that aim to study socio-technical issues on the Web and in particular issues that arise on fringe Web communities like /pol/.
The Pushshift Telegram Dataset (ICWSM 2020)
Authors: Jason Baumgartner (Pushshift), Savvas Zannettou, Megan Squire (Elon University), Jeremy Blackburn (Binghamton University)
Short description: In this dataset paper, we performed a large-scale data collection from Telegram. We made publicly available 317M messages from 27K groups, while at the same time we made available our data collection source code to support researchers that aim to study Telegram.