Opportunities for Master’s Theses and Research Internships
We are looking for motivated students (m/f/d) interested in writing their Master’s thesis or conducting a research internship in our group.
Our main research area is data center networking. We expect candidates to have interest and familiarity in the area of networking systems. A good passing grade in at least one of Operating Systems or Distributed Systems (or a related advanced course/seminar) is considered a plus.
Applicants should be proficient in systems programming (C/C++), scripting (e.g., Python, Bash), and familiar with basic development tools (e.g., make, gdb).
Experience in network stack development (e.g., kernel-level, user-space network stacks, DPDK) or with network simulators (e.g., htsim, ns-3) is helpful, but not required to start.
Students working with us will design and evaluate novel ideas in the context of large-scale network modeling (e.g., via simulation) and/or systems engineering challenges (e.g., via prototyping).
Please note that, as a small group, our advising capacity is limited. We can only supervise a small number of students at a time and may accept only a few applications. Our projects typically involve a significant amount of system design, low-level programming, debugging, and overall hands-on effort. Please apply only if you are genuinely interested in the area of network systems, and not just looking for any thesis or internship project.
If you are interested, please send us an email to rg2-hiring@mpi-inf.mpg.de including:
- Your CV
- Your academic transcript
- A brief statement of your research interests
- The type of project you are interested in (see examples below)
Optical / Reconfigurable Data Center Network Architectures
The slowdown of Moore’s law for electrical switches has driven the development of reconfigurable data center networks.
These networks leverage high-capacity, power-efficient optical circuit switches to create large-scale network fabrics for electrical endpoints such as GPUs in AI-specialized compute clusters.
Over the past decade, several reconfigurable architectures have been proposed with varying hardware requirements.
We are particularly interested in fast-switching architectures (microsecond- and nanosecond-scale optical circuit switching) and have several ongoing projects in this area.
Our current work focuses on the intersection of routing and topology design for these emerging architectures.
Selected Papers:
- Uniform-Cost Multi-Path Routing for Reconfigurable Data Center Networks (2024)
- Unlocking Diversity of Fast-Switched Optical Data Center Networks With Unified Routing (2025)
- Hop-On Hop-Off Routing: A Fast Tour across the Optical Data Center Network for Latency-Sensitive Flows (2022)
- OpenOptics: An Open Research Framework for Optical Data Center Networks (2024)
Transport Protocols and Congestion Control for Data Center Networks
Next-generation data centers demand both low-latency and high-throughput data delivery.
With advanced hardware and in-network assistance, transport protocols have evolved beyond traditional TCP variants to achieve unprecedented performance.
We are particularly interested in proactive transport protocols, as well as high-performance hardware transports such as Remote Direct Memory Access (RDMA).
Selected Papers:
- Unlocking Superior Performance in Reconfigurable Data Center Networks with Credit-Based Transport (2025)
- Rethinking Transport Protocols for Reconfigurable Data Centers (2024)
Network-Accelerated Machine Learning Systems
Machine learning model training and inference have become dominant workloads in modern data centers. Efficiently utilizing thousands of interconnected accelerators to train and serve large models is a major system challenge. Our research tackles this problem across multiple system layers, with a particular focus on networking aspects, including parallelism strategies, network topology design, and traffic scheduling.
We are seeking motivated interns and thesis students with experience in one or more of the following areas:
- Parallelism techniques for distributed training and inference
- Machine learning frameworks (JAX/PyTorch)
- Collective communication libraries (NCCL/NVSHMEM)
- GPU kernel programming (CUDA/Triton)
- Machine learning compilers (XLA/torch.compile)
Selected Papers: