Theory of Distributed Systems

Broadly speaking, distributed computing concerns systems that consist of multiple agents that act based on local information. The main challenge is typically how agents gain access to sufficient information to collaboratively solve a task quickly, despite limits on communication, faults, or inaccurate data.

Research Area

We are interested in all aspects of distributed computing, but the following are the topics we currently focus on.

  • Graph Algorithms:

    Naturally, well-known graph structures like dominating sets, colorings, minimum spanning trees, matchings, etc. are of great utility in distributed systems, too. We tackle these problems with all available means: randomization, approximation, decomposition, and so on. The key differences to centralized algorithms are that each node can perform its own computations, initially is aware of its neighborhood in the graph only, and needs to learn its local share of the output only (e.g., whether it is in the dominating set or not). To this end, nodes communicate over the edges of the graph; this communication is the main cost factor.

  • Fault-Tolerance:

    The nodes of a distributed system can be used to solve tasks redundantly, permitting to limit the effets of faults on the system as a whole. A second aspect is that faults are not necessarily permanent, implying that individual nodes or an entire system can recover from corrupted memory states that result from transient faults. We study fault-tolerance in distributed systems, examining to what extent it can be prevented that faulty nodes corrupt the state of other nodes, under what circumstances it is possible to recover from transients and how fast, and how costly respective algorithms are with respect to communication and time complexity.

  • Design of Reliable Hardware:

    Today's Very Large Scale Integrated (VLSI) circuits operate on such small length and time scales that they become distributed systems in their own right. Small subcircuits take the role of nodes that communicate via wires - the "edges" of the communication graph - and billions of individual components render fault-tolerance mission-critical. Hence, insights from both of the above domains are useful for designing reliable hardware. However, VLSI circuits add unique challenges. In low-level hardware, even simple computations like an addition require substantial effort, and violation of timing constraints may induce so-called metastability, states of the logic that lie outside the abstraction of binary logic. We seek to adapt the algorithmic ideas from above areas to this environment, with the goal of devising hardware with strong, provable fault-tolerance properties.

Sample Publications