Internet Security

Illuminating Large-Scale IPv6 Scanning in the Internet

Investigators: Oliver Gasser in cooperation with Philipp Richter and Arthur Berger (Akamai)

While scans of the IPv4 space are ubiquitous, today still very little is known about scanning activity in the IPv6 Internet. Scans in the IPv4 Internet have been studied for decades,but their IPv6 counterparts remain understudied. Therefore, in this project, we performed a longitudinal and detailed empirical study on large-scale IPv6 scanning behavior in the Internet,based on firewall logs captured at some 230,000 hosts of a major Content Distribution Network (CDN). We presented detailed analyses on large-scale IPv6 scans carried out over the course of 15 months, as seen from a major CDN. We analyzed scan sources, and studied targeted services and addresses. We find that, unlike IPv4 scans, large-scale IPv6 scans are still comparably rare events, and we find them originating only from some 60 ASes. Further, IPv6 scan packets are concentrated on a small number of very active scan sources, with the two most active sources accounting for more than 70% of all logged scan traffic throughout our measurement window. Many large-scale IPv6 scans do not target a single or a small number of specific services, but rather scan large swaths of port numbers, sometimes exceeding 100 ports targeted per scan. This behavior more closely resembles general and unspecific penetration testing behavior, as opposed to scanning patterns of botnets trying to spread laterally by exploiting individual vulnerabilities. Our findings showed that IPv6 scans in the wild show widely different characteristics from the more well-known IPv4 scans. We contrasted our findings with what can be observed in publicly available data in the MAWI dataset, and we discussed potential reasons for our observations. In this project [1], we developed methods to identify IPv6 scans, assessed current and past levels of IPv6 scanning activity, and studied dominant characteristics of scans, including scanner origins, targeted services, and insights on how scanners find target IPv6 addresses. Where possible, we compared our findings to what can be assessed from publicly available traces. This project identifies and highlights new challenges to detect scanning activity in the IPv6 Internet, and uncovers that today’s scans of the IPv6 space show widely different characteristics when compared to the more well-known IPv4 scans.

• [1] P. Richter, O. Gasser, and A. Berger. Illuminating large-scale IPv6 scanning in the internet. In C. Barakat and C. Pelsser, eds., IMC ’22, ACM Internet Measurement Conference, Nice, France, 2022, pp. 410–418. ACM.

Third Time’s Not a Charm: Exploiting SNMPv3 for Router Fingerprinting

Investigators: Oliver Gasser in cooperation with Taha Albakour (TU Berlin), Robert Beverly (Naval Postgraduate School), and Georgios Smaragdakis (TU Delft)

Remote management functionalities are fundamental to efficient network operation. To address this need, the Simple Network Management Protocol (SNMP) was introduced in the 1980s and has since served as the de facto protocol for fault notification, diagnostics, configuration management, and statistics gathering in IP networks. As a core IP management protocol that is widely implemented, it is unsurprising that SNMP has been both exploited and leveraged as an attack vector—indeed, there are over 400 SNMP-related CVEs. The protocol itself has historically been insecure, with the first standardized versions (SNMPv1 and SNMPv2) including only basic authentication via unencrypted “community strings.” Security conscious operators were therefore forced to restrict SNMP access to internal networks. The current SNMPv3 standard, introduced in 2002, is implemented on virtually all modern network equipment. The primary focus of SNMPv3 is to provide a secure version of the protocol by including mechanisms for robust authentication, integrity, and privacy. Of direct relevance to our work is the so-called SNMP “engine ID.” During synchronization with a client, the SNMPv3 agent exchanges its engine ID as a unique identifier. As noted in the RFC: the “snmpEngineID is the unique and unambiguous identifier of an SNMP engine. Since there is a one-to-one association between SNMP engines and SNMP entities, it also uniquely and unambiguously identifies the SNMP entity.” In this research project [1], we showed that adoption of the SNMPv3 network management protocol standard offers a unique—but likely unintended—opportunity for remotely fingerprinting network infrastructure in the wild. Specifically, by sending unsolicited and unauthenticated SNMPv3 requests, we obtained detailed information about the configuration and status of network devices including vendor, uptime, and the number of restarts. More importantly, the reply contains a persistent and strong identifier that allows for lightweight Internet-scale alias resolution and dual-stack association. By launching active Internet-wide SNMPv3 scan campaigns, we showed that our technique can fingerprint more than 4.6 million devices of which around 350k are network routers. Not only is our technique lightweight and accurate, it is complementary to existing alias resolution, dual-stack inference, and device fingerprinting approaches. Our analysis not only provided fresh insights into the router deployment strategies of network operators worldwide, but also highlighted potential vulnerabilities of SNMPv3 as currently deployed.

• [1] T. Albakour, O. Gasser, R. Beverly, and G. Smaragdakis. Third time’s not a charm: Exploiting SNMPv3 for router fingerprinting. In IMC ’21, ACM Internet Measurement Conference, Virtual Event, USA, 2021, pp. 150–164. ACM.

A Multi-Perspective Analysis of Web Cookies

Investigators: Ali Rasaii, Shivani Singh, Devashish Gosain, and Oliver Gasser

Web cookies have been the subject of many research studies over the last few years. However, most existing research does not consider multiple crucial perspectives that can influence the cookie landscape, such as the client’s location, the impact of cookie banner interaction, and from which operating system a website is being visited. In this project [1], we conduct a comprehensive measurement study to analyze the cookie landscape for Tranco top-10k websites from different geographic locations and analyze multiple different perspectives. One important factor which influences cookies is the use of cookie banners. Most research involving GDPR does not consider interaction with cookie banners (e.g. clicking accept/reject buttons). Thus, we develop the automated tool, BannerClick, to identify and interact with banners with an accuracy of 99% and 96% respectively. We detect banners on about 47% of the Tranco top-10k websites in the EU region, whereas in non-EU regions we find banners on less than 30% of websites. We also investigate the difference in the number of cookies before and after interacting with a cookie banner and find an increase of 5.5x for third-party cookies. Moreover, we analyze the effect of banner interaction on different types of cookies (i.e. first-party, third- party, and tracking). For instance, we observe that websites send, on average, 5.5x more third- party cookies after clicking “accept”, underlining that it is critical to interact with banners when performing Web measurements. Additionally, we analyze statistical consistency, evaluate the widespread deployment of consent management platforms, compare landing to inner pages, and assess the impact of visiting a website on a desktop compared to a mobile phone. Our study highlights that all of these factors substantially impact the cookie landscape, and thus a multi-perspective approach should be taken when performing Web measurement studies.

• [1] A. Rasaii, S. Singh, D. Gosain, and O. Gasser. Exploring the cookieverse: A multi-perspective analysis of web cookies. In A. Brunstrom, M. Flores, and M. Fiore, eds., Passive and Active Measurement (PAM 2023), Virtual Event, 2023, LNCS 13882, pp. 623–651. Springer.

Yarrpbox: Detecting Middleboxes at Internet-Scale

Investigators: Fahad Hilal and Oliver Gasser

The end-to-end principle is one of the foundations of the original Internet architecture. It states that packets should remain unaltered while in transit between the two endpoints of a communication. This principle is put to the test by middleboxes, i.e., intermediary devices manipulating traffic for purposes other than the standard functions of an IP router. In the current Internet, they fulfill a multitude of tasks such as thwarting of attacks, censoring or monitoring users, address space expansions, or balancing resources. Apart from breaking the end-to-end principle, the deployment of middleboxes comes with additional caveats. They introduce hidden points of failure, thus complicating the debugging of networks. Moreover, they stand in the way of innovations and improvements to protocols and their extensions. The Internet has seen an increased deployment of these middleboxes owing to the value they bring. Therefore, it is important to have a good understanding of the middlebox ecosystem in the Internet. In this work, we perform a multi-faceted middlebox analysis study. We develop Yarrpbox [1], a tool to efficiently perform middlebox detection measurements on an Internet-scale. Yarrpbox is over 300 times faster than the current state of the art and can conduct large-scale measurements in under 10 hours. With Yarrpbox, we perform IPv4-wide middlebox detection and find that nearly 10% of paths are affected by a total of 5.8k middlebox devices. We perform the first IPv6 study to date, uncovering a lower prevalence of middleboxes in IPv6. Moreover, we show that the location of a vantage point can have an effect on the results, leading to up to 600 more detected middleboxes. Additionally, we characterize middleboxes by mapping them to vendors and resolving aliases.

• [1] F. Hilal and O. Gasser. Yarrpbox: Detecting middleboxes at internet-scale. In CoNEXT’23, International Conference on Emerging Networking Experiments And Technologies, 2023. ACM. Accepted for publication.

Rusty Clusters? Dusting an IPv6 Research Foundation

Investigators: Oliver Gasser in cooperation with Johannes Zirngibl, Lion Steger, Patrick Sattler, and Georg Carle (Technical University of Munich)

The long-running IPv6 Hitlist service is an important foundation for IPv6 measurement studies. It helps to overcome infeasible, complete address space scans by collecting valuable, unbiased IPv6 address candidates and regularly testing their responsiveness. However, the Internet itself is a quickly changing ecosystem that can affect long-running services,potentially inducing biases and obscurities into ongoing data collection means. Frequent analyses but also updates are necessary to enable a valuable service to the community. In this project [1], we showed that the existing IPv6 hitlist is highly impacted by the Great Firewall of China, and we offer a cleaned view on the development of responsive addresses. We evaluated the development of the IPv6 Hitlist over the last four years and new biases introduced by the accumulation of new addresses. Our findings allowed us to filter targets incorrectly tested as responsive. We identified 134 M addresses falsely reported as responsive to UDP/53 by the IPv6 Hitlist since 2018 due to the Great Firewall of China’s DNS injection. While the accumulated input showed an increasing bias towards some networks, the cleaned set of responsive addresses is well distributed and showed a steady increase in the number of addresses. Although it is a best practice to remove aliased prefixes from IPv6 hitlists, in our research project we showed that this also removes major content delivery networks. We analyzed aliased prefixes in more detail and investigated whether the initial definition of a single host responsive to a complete prefix remains correct or whether a set of addresses needs to be treated differently. We showed that aliased prefixes host at least 15 M domains including ranked domains from different top lists. More than 98% of all IPv6 addresses announced by Fastly were labeled as aliased and Cloudflare prefixes hosting more than 10 M domains were excluded. In combination with additional findings, we suggest users of the hitlist to include subsets of these prefixes in future research, depending on the hitlist usage, e.g., higher layer protocol scans. Lastly, we evaluated different new address candidate sources, including target generation algorithms to improve the coverage of the current IPv6 Hitlist.

We showed that a combination of different methodologies is able to identify 5.6 M new, responsive addresses. This accounts for an increase by 174 % and combined with the current IPv6 Hitlist, we identified 8.8 M responsive addresses in total. Finally, we updated the IPv6 Hitlist service to allow future research to use our findings within the established service.

• [1] J. Zirngibl, L. Steger, P. Sattler, O. Gasser, and G. Carle. Rusty clusters?: Dusting an IPv6 research foundation. In C. Barakat and C. Pelsser, eds., IMC ’22, ACM Internet Measurement Conference, Nice, France, 2022, pp. 395–409. ACM.

One Bad Apple Can Spoil Your IPv6 Privacy

Investigators: Oliver Gasser and Said Jawad Saidi in cooperation with Georgios Smaragdakis (TU Delft)

IPv6 is being more and more adopted, in part to facilitate the millions of smart devices that have already been installed at home. In this project [1], we found that the privacy of a substantial fraction of end-users is still at risk, despite the efforts by ISPs and electronic vendors to improve end-user security, e.g., by adopting prefix rotation and IPv6 privacy extensions. By analyzing passive data from a large European ISP, we found that around 19% of end-users’ privacy can be at risk, i.e., by the lack of IPv6 privacy extensions usage. Privacy extensions are a technique to generate a random IPv6 address suffix, instead of using the persistent and therefore trackable MAC address in the IPv6 address. When we investigated the root causes, we noticed that a single device at home that encodes its MAC address into the IPv6 address can be utilized as a tracking identifier for the entire end-user prefix—even if other devices use IPv6 privacy extensions. Our results showed that IoT devices contribute the most to this privacy leakage and, to a lesser extent, personal computers and mobile devices. To our surprise, some of the most popular IoT manufacturers have not yet adopted privacy extensions that could otherwise mitigate this privacy risk. Finally, we showed that third-party providers, e.g., popular content, application, or service providers, can track up to 17% of subscriber lines in our study due to lack of IPv6 privacy extensions.

• [1] S. J. Saidi, O. Gasser, and G. Smaragdakis. One bad apple can spoil your IPv6 privacy. ACM SIGCOMM Computer Communication Review, 52(2):10–19, 2022.

Enabling Multi-hop ISP-Hypergiant Collaboration

Investigators: Cristian Munteanu, Oliver Gasser, Anja Feldmann in collaboration with Georgios Smaragdakis (TU DELFT, Netherlands) and Ingmar Poese (BENOCS GmbH)

Today, there is an increasing number of peering agreements between Hypergiants and networks that benefit millions of end-user. However, the majority of Autonomous Systems (more than 50 thousands) do not currently enjoy the benefit of interconnecting directly with Hypergiants to optimally select the path for delivering Hypergiant traffic to their users. In our research, we develop and evaluate an architecture that can help this long-tail of networks. Our architecture includes multiple scenarios—with and without the cooperation of the transit provider. Indeed, the most basic one does not require any changes to the operation of transit providers, nor any re-negotiation of the relationship between networks and their transit providers or Hypergiants and transit providers. In our architecture, a network establishes an out-of-band communication channel with Hypergiants that can be two (or more) AS-hops away and, optionally, with the transit provider. This channel enables the exchange of network information to better assign requests of end-users to appropriate Hypergiant servers. Our analysis using operational data shows that our architecture can optimize, on average, 15% of Hypergiants’ traffic and 11% of the overall traffic of networks that are not inter-connected with Hypergiants. The gains are even higher during peak hours, where available capacity can be scarce. Our results also show that for some Hypergiants, more than 46% of their traffic delivered to networks via non-direct interconnection can be optimized.