Internet Security

Illuminating Large-Scale IPv6 Scanning in the Internet

Investigators: Oliver Gasser in cooperation with Philipp Richter and Arthur Berger (Akamai)

While scans of the IPv4 space are ubiquitous, today still very little is known about scanning activity in the IPv6 Internet. Scans in the IPv4 Internet have been studied for decades,but their IPv6 counterparts remain understudied. Therefore, in this project, we performed a longitudinal and detailed empirical study on large-scale IPv6 scanning behavior in the Internet,based on ﬁrewall logs captured at some 230,000 hosts of a major Content Distribution Network (CDN). We presented detailed analyses on large-scale IPv6 scans carried out over the course of 15 months, as seen from a major CDN. We analyzed scan sources, and studied targeted services and addresses. We ﬁnd that, unlike IPv4 scans, large-scale IPv6 scans are still comparably rare events, and we ﬁnd them originating only from some 60 ASes. Further, IPv6 scan packets are concentrated on a small number of very active scan sources, with the two most active sources accounting for more than 70% of all logged scan traﬃc throughout our measurement window. Many large-scale IPv6 scans do not target a single or a small number of speciﬁc services, but rather scan large swaths of port numbers, sometimes exceeding 100 ports targeted per scan. This behavior more closely resembles general and unspeciﬁc penetration testing behavior, as opposed to scanning patterns of botnets trying to spread laterally by exploiting individual vulnerabilities. Our ﬁndings showed that IPv6 scans in the wild show widely diﬀerent characteristics from the more well-known IPv4 scans. We contrasted our ﬁndings with what can be observed in publicly available data in the MAWI dataset, and we discussed potential reasons for our observations. In this project [1], we developed methods to identify IPv6 scans, assessed current and past levels of IPv6 scanning activity, and studied dominant characteristics of scans, including scanner origins, targeted services, and insights on how scanners ﬁnd target IPv6 addresses. Where possible, we compared our ﬁndings to what can be assessed from publicly available traces. This project identiﬁes and highlights new challenges to detect scanning activity in the IPv6 Internet, and uncovers that today’s scans of the IPv6 space show widely diﬀerent characteristics when compared to the more well-known IPv4 scans.

References
• [1] P. Richter, O. Gasser, and A. Berger. Illuminating large-scale IPv6 scanning in the internet. In C. Barakat and C. Pelsser, eds., IMC ’22, ACM Internet Measurement Conference, Nice, France, 2022, pp. 410–418. ACM.

Third Time’s Not a Charm: Exploiting SNMPv3 for Router Fingerprinting

Investigators: Oliver Gasser in cooperation with Taha Albakour (TU Berlin), Robert Beverly (Naval Postgraduate School), and Georgios Smaragdakis (TU Delft)

Remote management functionalities are fundamental to eﬃcient network operation. To address this need, the Simple Network Management Protocol (SNMP) was introduced in the 1980s and has since served as the de facto protocol for fault notiﬁcation, diagnostics, conﬁguration management, and statistics gathering in IP networks. As a core IP management protocol that is widely implemented, it is unsurprising that SNMP has been both exploited and leveraged as an attack vector—indeed, there are over 400 SNMP-related CVEs. The protocol itself has historically been insecure, with the ﬁrst standardized versions (SNMPv1 and SNMPv2) including only basic authentication via unencrypted “community strings.” Security conscious operators were therefore forced to restrict SNMP access to internal networks. The current SNMPv3 standard, introduced in 2002, is implemented on virtually all modern network equipment. The primary focus of SNMPv3 is to provide a secure version of the protocol by including mechanisms for robust authentication, integrity, and privacy. Of direct relevance to our work is the so-called SNMP “engine ID.” During synchronization with a client, the SNMPv3 agent exchanges its engine ID as a unique identiﬁer. As noted in the RFC: the “snmpEngineID is the unique and unambiguous identiﬁer of an SNMP engine. Since there is a one-to-one association between SNMP engines and SNMP entities, it also uniquely and unambiguously identiﬁes the SNMP entity.” In this research project [1], we showed that adoption of the SNMPv3 network management protocol standard oﬀers a unique—but likely unintended—opportunity for remotely ﬁngerprinting network infrastructure in the wild. Speciﬁcally, by sending unsolicited and unauthenticated SNMPv3 requests, we obtained detailed information about the conﬁguration and status of network devices including vendor, uptime, and the number of restarts. More importantly, the reply contains a persistent and strong identiﬁer that allows for lightweight Internet-scale alias resolution and dual-stack association. By launching active Internet-wide SNMPv3 scan campaigns, we showed that our technique can ﬁngerprint more than 4.6 million devices of which around 350k are network routers. Not only is our technique lightweight and accurate, it is complementary to existing alias resolution, dual-stack inference, and device ﬁngerprinting approaches. Our analysis not only provided fresh insights into the router deployment strategies of network operators worldwide, but also highlighted potential vulnerabilities of SNMPv3 as currently deployed.

References
• [1] T. Albakour, O. Gasser, R. Beverly, and G. Smaragdakis. Third time’s not a charm: Exploiting SNMPv3 for router ﬁngerprinting. In IMC ’21, ACM Internet Measurement Conference, Virtual Event, USA, 2021, pp. 150–164. ACM.

A Multi-Perspective Analysis of Web Cookies

Investigators: Ali Rasaii, Shivani Singh, Devashish Gosain, and Oliver Gasser

Web cookies have been the subject of many research studies over the last few years. However, most existing research does not consider multiple crucial perspectives that can inﬂuence the cookie landscape, such as the client’s location, the impact of cookie banner interaction, and from which operating system a website is being visited. In this project [1], we conduct a comprehensive measurement study to analyze the cookie landscape for Tranco top-10k websites from diﬀerent geographic locations and analyze multiple diﬀerent perspectives. One important factor which inﬂuences cookies is the use of cookie banners. Most research involving GDPR does not consider interaction with cookie banners (e.g. clicking accept/reject buttons). Thus, we develop the automated tool, BannerClick, to identify and interact with banners with an accuracy of 99% and 96% respectively. We detect banners on about 47% of the Tranco top-10k websites in the EU region, whereas in non-EU regions we ﬁnd banners on less than 30% of websites. We also investigate the diﬀerence in the number of cookies before and after interacting with a cookie banner and ﬁnd an increase of 5.5x for third-party cookies. Moreover, we analyze the eﬀect of banner interaction on diﬀerent types of cookies (i.e. ﬁrst-party, third- party, and tracking). For instance, we observe that websites send, on average, 5.5x more third- party cookies after clicking “accept”, underlining that it is critical to interact with banners when performing Web measurements. Additionally, we analyze statistical consistency, evaluate the widespread deployment of consent management platforms, compare landing to inner pages, and assess the impact of visiting a website on a desktop compared to a mobile phone. Our study highlights that all of these factors substantially impact the cookie landscape, and thus a multi-perspective approach should be taken when performing Web measurement studies.

References
• [1] A. Rasaii, S. Singh, D. Gosain, and O. Gasser. Exploring the cookieverse: A multi-perspective analysis of web cookies. In A. Brunstrom, M. Flores, and M. Fiore, eds., Passive and Active Measurement (PAM 2023), Virtual Event, 2023, LNCS 13882, pp. 623–651. Springer.

Yarrpbox: Detecting Middleboxes at Internet-Scale

Investigators: Fahad Hilal and Oliver Gasser

The end-to-end principle is one of the foundations of the original Internet architecture. It states that packets should remain unaltered while in transit between the two endpoints of a communication. This principle is put to the test by middleboxes, i.e., intermediary devices manipulating traﬃc for purposes other than the standard functions of an IP router. In the current Internet, they fulﬁll a multitude of tasks such as thwarting of attacks, censoring or monitoring users, address space expansions, or balancing resources. Apart from breaking the end-to-end principle, the deployment of middleboxes comes with additional caveats. They introduce hidden points of failure, thus complicating the debugging of networks. Moreover, they stand in the way of innovations and improvements to protocols and their extensions. The Internet has seen an increased deployment of these middleboxes owing to the value they bring. Therefore, it is important to have a good understanding of the middlebox ecosystem in the Internet. In this work, we perform a multi-faceted middlebox analysis study. We develop Yarrpbox [1], a tool to eﬃciently perform middlebox detection measurements on an Internet-scale. Yarrpbox is over 300 times faster than the current state of the art and can conduct large-scale measurements in under 10 hours. With Yarrpbox, we perform IPv4-wide middlebox detection and ﬁnd that nearly 10% of paths are aﬀected by a total of 5.8k middlebox devices. We perform the ﬁrst IPv6 study to date, uncovering a lower prevalence of middleboxes in IPv6. Moreover, we show that the location of a vantage point can have an eﬀect on the results, leading to up to 600 more detected middleboxes. Additionally, we characterize middleboxes by mapping them to vendors and resolving aliases.

References
• [1] F. Hilal and O. Gasser. Yarrpbox: Detecting middleboxes at internet-scale. In CoNEXT’23, International Conference on Emerging Networking Experiments And Technologies, 2023. ACM. Accepted for publication.

Rusty Clusters? Dusting an IPv6 Research Foundation

Investigators: Oliver Gasser in cooperation with Johannes Zirngibl, Lion Steger, Patrick Sattler, and Georg Carle (Technical University of Munich)

The long-running IPv6 Hitlist service is an important foundation for IPv6 measurement studies. It helps to overcome infeasible, complete address space scans by collecting valuable, unbiased IPv6 address candidates and regularly testing their responsiveness. However, the Internet itself is a quickly changing ecosystem that can aﬀect long-running services,potentially inducing biases and obscurities into ongoing data collection means. Frequent analyses but also updates are necessary to enable a valuable service to the community. In this project [1], we showed that the existing IPv6 hitlist is highly impacted by the Great Firewall of China, and we oﬀer a cleaned view on the development of responsive addresses. We evaluated the development of the IPv6 Hitlist over the last four years and new biases introduced by the accumulation of new addresses. Our ﬁndings allowed us to ﬁlter targets incorrectly tested as responsive. We identiﬁed 134 M addresses falsely reported as responsive to UDP/53 by the IPv6 Hitlist since 2018 due to the Great Firewall of China’s DNS injection. While the accumulated input showed an increasing bias towards some networks, the cleaned set of responsive addresses is well distributed and showed a steady increase in the number of addresses. Although it is a best practice to remove aliased preﬁxes from IPv6 hitlists, in our research project we showed that this also removes major content delivery networks. We analyzed aliased preﬁxes in more detail and investigated whether the initial deﬁnition of a single host responsive to a complete preﬁx remains correct or whether a set of addresses needs to be treated diﬀerently. We showed that aliased preﬁxes host at least 15 M domains including ranked domains from diﬀerent top lists. More than 98% of all IPv6 addresses announced by Fastly were labeled as aliased and Cloudﬂare preﬁxes hosting more than 10 M domains were excluded. In combination with additional ﬁndings, we suggest users of the hitlist to include subsets of these preﬁxes in future research, depending on the hitlist usage, e.g., higher layer protocol scans. Lastly, we evaluated diﬀerent new address candidate sources, including target generation algorithms to improve the coverage of the current IPv6 Hitlist.

We showed that a combination of diﬀerent methodologies is able to identify 5.6 M new, responsive addresses. This accounts for an increase by 174 % and combined with the current IPv6 Hitlist, we identiﬁed 8.8 M responsive addresses in total. Finally, we updated the IPv6 Hitlist service to allow future research to use our ﬁndings within the established service.

References
• [1] J. Zirngibl, L. Steger, P. Sattler, O. Gasser, and G. Carle. Rusty clusters?: Dusting an IPv6 research foundation. In C. Barakat and C. Pelsser, eds., IMC ’22, ACM Internet Measurement Conference, Nice, France, 2022, pp. 395–409. ACM.

One Bad Apple Can Spoil Your IPv6 Privacy

Investigators: Oliver Gasser and Said Jawad Saidi in cooperation with Georgios Smaragdakis (TU Delft)

IPv6 is being more and more adopted, in part to facilitate the millions of smart devices that have already been installed at home. In this project [1], we found that the privacy of a substantial fraction of end-users is still at risk, despite the eﬀorts by ISPs and electronic vendors to improve end-user security, e.g., by adopting preﬁx rotation and IPv6 privacy extensions. By analyzing passive data from a large European ISP, we found that around 19% of end-users’ privacy can be at risk, i.e., by the lack of IPv6 privacy extensions usage. Privacy extensions are a technique to generate a random IPv6 address suﬃx, instead of using the persistent and therefore trackable MAC address in the IPv6 address. When we investigated the root causes, we noticed that a single device at home that encodes its MAC address into the IPv6 address can be utilized as a tracking identiﬁer for the entire end-user preﬁx—even if other devices use IPv6 privacy extensions. Our results showed that IoT devices contribute the most to this privacy leakage and, to a lesser extent, personal computers and mobile devices. To our surprise, some of the most popular IoT manufacturers have not yet adopted privacy extensions that could otherwise mitigate this privacy risk. Finally, we showed that third-party providers, e.g., popular content, application, or service providers, can track up to 17% of subscriber lines in our study due to lack of IPv6 privacy extensions.

References
• [1] S. J. Saidi, O. Gasser, and G. Smaragdakis. One bad apple can spoil your IPv6 privacy. ACM SIGCOMM Computer Communication Review, 52(2):10–19, 2022.

Enabling Multi-hop ISP-Hypergiant Collaboration

Investigators: Cristian Munteanu, Oliver Gasser, Anja Feldmann in collaboration with Georgios Smaragdakis (TU DELFT, Netherlands) and Ingmar Poese (BENOCS GmbH)

Today, there is an increasing number of peering agreements between Hypergiants and networks that beneﬁt millions of end-user. However, the majority of Autonomous Systems (more than 50 thousands) do not currently enjoy the beneﬁt of interconnecting directly with Hypergiants to optimally select the path for delivering Hypergiant traﬃc to their users. In our research, we develop and evaluate an architecture that can help this long-tail of networks. Our architecture includes multiple scenarios—with and without the cooperation of the transit provider. Indeed, the most basic one does not require any changes to the operation of transit providers, nor any re-negotiation of the relationship between networks and their transit providers or Hypergiants and transit providers. In our architecture, a network establishes an out-of-band communication channel with Hypergiants that can be two (or more) AS-hops away and, optionally, with the transit provider. This channel enables the exchange of network information to better assign requests of end-users to appropriate Hypergiant servers. Our analysis using operational data shows that our architecture can optimize, on average, 15% of Hypergiants’ traﬃc and 11% of the overall traﬃc of networks that are not inter-connected with Hypergiants. The gains are even higher during peak hours, where available capacity can be scarce. Our results also show that for some Hypergiants, more than 46% of their traﬃc delivered to networks via non-direct interconnection can be optimized.