Copyright notice: The respective pre-print or author-prepared version of a paper
(available after clicking the
-icon)
is provided for timely communication among scholars,
without permission for further distribution.
The definitive version of a paper is the published version.
The hypervolume indicator (HYP) is a popular measure for the quality of a set of n solutions in d-space. We discuss its asymptotic worst-case runtimes and several lower bounds depending on different complexity-theoretic assumptions. Assuming that P does not equal NP, there is no algorithm with runtime poly(n,d). Assuming the exponential time hypothesis, there is no algorithm with runtime no(d). In contrast to these worst-case bounds, we study the average-case complexity of HYP for points distributed i.i.d. at random on a d-dimensional simplex. We present a general framework which translates any algorithm for HYP with worst-case runtime nf(d) to an algorithm with worst-case runtime nf(d)+1 and fixed-parameter-tractable (FPT) average-case runtime. This can be used to show that HYP can be solved in expected time O(dd2/2 n + d n2 ), which implies that HYP is FPT on average while it is W[1]-hard in the worst-case. For constant dimension d this gives an algorithm for HYP with runtime O(n2) on average. This is the first result proving that HYP is asymptotically easier in the average case. It gives a theoretical explanation why most HYP algorithms perform much better on average than their theoretical worst-case runtime predicts.
Randomized rumor spreading was recently shown to be a very efficient mechanism to spread information in preferential attachment networks. Most interesting from the algorithm design point of view was the observation that the asymptotic run-time drops when memory is used to avoid re-contacting neighbors within a small number of rounds.
In this experimental investigation, we confirm that a small amount of memory indeed reduces the run-time of the protocol even for small network sizes. We observe that one memory cell per node suffices to reduce the run-time significantly; more memory helps comparably little. Aside from extremely sparse graphs, preferential attachment graphs perform faster than all other graph classes examined. This holds independent of the amount of memory, but preferential attachment graphs benefit the most from the use of memory. We also analyze the influence of the network density and the size of the memory. For the asynchronous version of the rumor spreading protocol, we observe that the theoretically predicted asymptotic advantage of preferential attachment graphs is smaller than expected. There are other topologies which benefit even more from asynchrony.
We complement our findings on artificial network models by the corresponding experiments on crawls of popular online social networks, where again we observe extremely rapid information dissemination and a sizable benefit from using memory and asynchrony.
Finding cliques in graphs is a classical problem which is in general NP-hard and parameterized intractable. However, in typical applications like social networks or protein-protein interaction networks, the considered graphs are scale-free, i.e., their degree sequence follows a power law. Their specific structure can be algorithmically exploited and makes it possible to solve clique much more efficiently. We prove that on inhomogeneous random graphs with n nodes and power law exponent γ, cliques of size k can be found in time O(n2) for γ ≥ 3 and in time O(n exp(k4)) for 2<γ<3.
We show that the asynchronous push-pull protocol spreads rumors in preferential attachment graphs in time O(√log(n)) to all but a lower order fraction of the nodes with high probability. This is significantly faster than what synchronized protocols can achieve; an obvious lower bound for these is the average distance, which is known to be Θ(log n / loglog n) for preferential attachment graphs.
We study the convergence behavior of (μ+λ)-archiving algorithms. A (μ+λ)-archiving algorithm defines how to choose in each generation μ children from μ parents and λ offspring together. Archiving algorithms have to choose individuals online without knowing future offspring. Previous studies assumed the offspring generation to be best-case. We assume the initial population and the offspring generation to be worst-case and use the competitive ratio to measure how much smaller hypervolumes an archiving algorithm finds due to not knowing the future in advance. We prove that all archiving algorithms which increase the hypervolume in each step (if they can) are only μ-competitive. We also present a new archiving algorithm which is (4+2/μ)-competitive. This algorithm not only achieves a constant competitive ratio, but is also efficiently computable. Both properties provably do not hold for the commonly used greedy archiving algorithms, for example those used in SIBEA, SMS-EMOA, or the generational MO-CMA-ES.
The integrated analysis of data of different types and with various interdependencies is one of the major challenges in computational biology. Recently, we developed KeyPathwayMiner, a method that combines biological networks modeled as graphs with disease-specific genetic expression data gained from a set of cases (patients, cell lines, tissues, etc.). We aimed to find all maximal connected sub-graphs where all nodes but K are expressed in all cases but at most L, i.e. key pathways. To do so, we combined biological networks with OMICS data, instead of analyzing these data sets in isolation. Here we present an alternative approach: Now we aim to extract all maximal connected sub-networks where all but at most K nodes are expressed in all cases but in total (!) at most L, i.e. accumulated over all cases and all nodes in a solution. We call this strategy GLONE (global node exceptions); the previous problem we call INES (individual node exceptions). Since finding GLONE-components is NP-hard, we developed an Ant Colony Optimization algorithm and implemented it with the KeyPathwayMiner Cytoscape framework as an alternative to the INES algorithms. KeyPathwayMiner 3.0 now offers both the INES and the GLONE algorithms. It is available as a Cytoscape plugin and online at keypathwayminer.mpi-inf.mpg.de.
A random geometric graph (RGG) is defined by placing n points uniformly at random in the d-dimensional space [0,n1/d]d, and joining two points by an edge whenever their Euclidean distance is at most some fixed r. We assume that r is larger than the critical value for the emergence of a connected component with Ω(n) nodes. We show that, with high probability (w.h.p.), for any two connected nodes with a minimum Euclidean distance of ω(log n), their graph distance is only a constant factor larger than their Euclidean distance. This property implies that the diameter of the largest connected component is Θ(n1/d/r ) w.h.p.
We also analyze the following randomized broadcast algorithm on RGGs. At the beginning, only one node from the largest connected component of the RGG is informed. Then, in each round, each informed node chooses a neighbor independently and uniformly at random and informs it. We prove that w.h.p. this algorithm informs every node in the largest connected component of an RGG within Θ(n1/d/r+log n).
Evolutionary algorithms have been widely used to tackle multi-objective optimization problems. Incorporating preference information into the search of evolutionary algorithms for multi-objective optimization is of great importance as it allows to focus on interesting regions in the objective space. Zitzler et al. have shown how to use a weight distribution function on the objective space to incorporate preference information into hypervolume-based algorithms. We show that this weighted information can easily be used in other popular EMO algorithms as well. Our results for NSGA-II and SPEA2 show that this yields similar results to the hypervolume approach and requires less computational effort.
We give an algorithm that computes the final state of certain growth models without computing all intermediate states. Our technique is based on a “least action principle” which characterizes the odometer function of the growth process. Starting from an educated guess for the odometer, we successively correct under- and overestimates and provably arrive at the correct final state. The degree of speedup depends on the accuracy of the initial guess.
Determining the size of the boundary fluctuations in growth models like internal diffusion-limited aggregation (IDLA) is a long-standing open problem in statistical physics. As an application of our method, we calculate the size of fluctuations over two orders of magnitude beyond previous simulations. Our data strongly support the conjecture that the fluctuations of IDLA are logarithmic in the radius.
Multi-objective optimization problems arise frequently in applications but can often only be solved approximately by heuristic approaches. Evolutionary algorithms have been widely used to tackle multi-objective problems. These algorithms use different measures to ensure diversity in the objective space but are not guided by a formal notion of approximation.
We present a new framework of an evolutionary algorithm for multi-objective optimization that allows to work with a formal notion of approximation. Our experimental results show that our approach outperforms state-of-the-art evolutionary algorithms in terms of the quality of the approximation that is obtained in particular for problems with many objectives.
NOTE: The version published in the proceedings contains two errors: (1) The word “maximal” in line 20 of algorithm 4 should be replaced by “minimal”. (2) The dominance conditions in lines 3 and 4 of algorithm 2 have to be turned around: a<p becomes p>a, and a≥p becomes a≤p. The version available here is the corrected version.
We examine the complexity of constraint satisfaction problems (CSP) that consist of a set of AllDiff constraints. Such CSPs naturally model a wide range of real-world and combinatorial problems, like time-tabling, frequency allocations and graph coloring problems. As this problem is known to be NP-complete, we investigate under which further assumptions it becomes tractable. We observe that a crucial property seems to be the convexity of the variable domains and constraints. Our main contribution is an extensive study of the complexity of these Multiple AllDiff CSPs for a set natural parameters, like maximum domain size and maximum size of the constraint scopes. We show that, depending on the parameter, convexity can make the problem tractable while it is provably intractable in general.
The core of hypervolume-based multi-objective evolutionary algorithms is an archiving algorithm which performs the environmental selection. A (μ+λ)-archiving algorithm defines how to choose μ children from μ parents and λ offspring together. We study theoretically (μ+λ)-archiving algorithms which never decrease the hypervolume from one generation to the next.
Zitzler, Thiele, and Bader (IEEE Trans. Evolutionary Computation, 14:58-79, 2010) proved that all (μ+1)-archiving algorithms are ineffective, which means there is an initial population such that independent of the used reproduction rule, a set with maximum hypervolume cannot be reached. We extend this and prove that for λ<μ all archiving algorithms are ineffective. On the other hand, locally optimal algorithms, which maximize the hypervolume in each step, are effective for λ=μ and can always find a population with hypervolume at least half the optimum for λ<μ.
We also prove that there is no hypervolume-based archiving algorithm which can always find a population with hypervolume greater than 1/(1+0.1338(1/λ-1/μ)) times the optimum.
With the prevalence of social networks, it has become increasingly important to understand their features and limitations. It has been observed that information spreads extremely fast in social networks. We study the performance of randomized rumor spreading protocols on graphs in the preferential attachment model. The well-known random phone call model of Karp et al. (FOCS 2000) is a push-pull strategy where in each round, each vertex chooses a random neighbor and exchanges information with it. We prove the following.
• The push-pull strategy delivers a message to all nodes within Θ(log n) rounds with high probability. The best known bound so far was O(log2 n).
• If we slightly modify the protocol so that contacts are chosen uniformly from all neighbors but the one contacted in the previous round, then this time reduces to Θ(log(n)/log log n), which is the diameter of the graph. This is the first time that a sublogarithmic broadcast time is proven for a natural setting. Also, this is the first time that avoiding double-contacts reduces the run-time to a smaller order of magnitude.
We present a new randomized diffusion-based algorithm for balancing indivisible tasks (tokens) on a network. Our aim is to minimize the discrepancy between the maximum and minimum load. The algorithm works as follows. Every vertex distributes its tokens as evenly as possible among its neighbors and itself. If this is not possible without splitting some tokens, the vertex redistributes its excess tokens among all its neighbors randomly (without replacement).
In this paper we prove several upper bounds on the load discrepancy for general networks. These bounds depend on some expansion properties of the network, that is, the second largest eigenvalue, and a novel measure which we refer to as refined local divergence. We then apply these general bounds to obtain results for some specific networks. For constant-degree expanders and torus graphs, these yield exponential improvements on the discrepancy bounds compared to the algorithm of Rabani, Sinclair, and Wanka (1998). For hypercubes we obtain a polynomial improvement.
In contrast to previous papers, our algorithm is vertex-based and not edge-based. This means excess tokens are assigned to vertices instead to edges, and the vertex reallocates all of its excess tokens by itself. This approach avoids nodes having “negative loads”, but causes additional dependencies for the analysis.
In order to allow a comparison of (otherwise incomparable) sets, many evolutionary multiobjective optimizers use indicator functions to guide the search and to evaluate the performance of search algorithms. The most widely used indicator is the hypervolume indicator. It measures the volume of the dominated portion of the objective space.
Though the hypervolume indicator is very popular, it has not been shown that maximizing the hypervolume indicator is indeed equivalent to the overall objective of finding a good approximation of the Pareto front. To address this question, we compare the optimal approximation factor with the approximation factor achieved by sets maximizing the hypervolume indicator. We bound the optimal approximation factor of n points by 1+Θ(1/n) for arbitrary Pareto fronts. Furthermore, we prove that the same asymptotic approximation ratio is achieved by sets of n points that maximize the hypervolume indicator. This shows that the speed of convergence of the approximation ratio achieved by maximizing the hypervolume indicator is asymptotically optimal.
This implies that for large values of n, sets maximizing the hypervolume indicator quickly approach the optimal approximation ratio. Moreover, our bounds show that also for relatively small values of n, sets maximizing the hypervolume indicator achieve a near-optimal approximation ratio.
We propose a simple distributed algorithm for balancing indivisible tokens on graphs. The algorithm is completely deterministic, though it tries to imitate (and enhance) a random algorithm by keeping the accumulated rounding errors as small as possible.
Our new algorithm approximates the idealized process (where the tokens are divisible) on important network topologies surprisingly closely. On d-dimensional torus graphs with n nodes it deviates from the idealized process only by an additive constant. In contrast to that, the randomized rounding approach of Friedrich and Sauerwald (2009) can deviate up to Ω(polylog n) and the deterministic algorithm of Rabani, Sinclair, and Wanka (1998) has a deviation of Ω(n1/d). This makes our quasirandom algorithm the first known algorithm for this setting which is optimal both in time and achieved smoothness. We further show that also on the hypercube our algorithm has a smaller deviation from the idealized process than the previous algorithms.
To prove these results, we derive several combinatorial and probabilistic results that we believe to be of independent interest. In particular, we show that first-passage probabilities of a random walk on a path with arbitrary weights can be expressed as a convolution of independent geometric probability distributions.
Randomized rumor spreading is an efficient protocol to distribute information in networks. Recently, a quasirandom version has been proposed and proven to work equally well on many graphs and better for sparse random graphs. In this work we show three main results for the quasirandom rumor spreading model.
We exhibit a natural expansion property for networks which suffices to make quasirandom rumor spreading inform all nodes of the network in logarithmic time with high probability. This expansion property is satisfied, among others, by many expander graphs, random regular graphs, and Erdős-Rényi random graphs.
For all network topologies, we show that if one of the push or pull model works well, so does the other. We also show that quasirandom rumor spreading is robust against transmission failures. If each message sent out gets lost with probability f, then the runtime increases only by a factor of O(1/(1-f)).
We consider and analyze a new algorithm for balancing indivisible loads on a distributed network with n processors. The aim is minimizing the discrepancy between the maximum and minimum load. In every time-step paired processors balance their load as evenly as possible. The direction of the excess token is chosen according to a randomized rounding of the participating loads.
We prove that in comparison to the corresponding model of Rabani, Sinclair, and Wanka (1998) with arbitrary roundings, the randomization yields an improvement of roughly a square root of the achieved discrepancy in the same number of time-steps on all graphs. For the important case of expanders we can even achieve a constant discrepancy in O(log n (log log n)3) rounds. This is optimal up to loglog-factors while the best previous algorithms in this setting either require Ω(log2 n) time or can only achieve a logarithmic discrepancy. Our new result also demonstrates that with randomized rounding the difference between discrete and continuous load balancing vanishes almost completely.
The hypervolume indicator is an increasingly popular set measure to compare the quality of two Pareto sets. The basic ingredient of most hypervolume indicator based optimization algorithms is the calculation of the hypervolume contribution of single solutions regarding a Pareto set. We show that exact calculation of the hypervolume contribution is #P-hard while its approximation is NP-hard. The same holds for the calculation of the minimal contribution. We also prove that it is NP-hard to decide whether a solution has the least hypervolume contribution. Even deciding whether the contribution of a solution is at most (1+ε) times the minimal contribution is NP-hard. This implies that it is neither possible to efficiently find the least contributing solution (unless P=NP) nor to approximate it (unless NP=BPP).
Nevertheless, in the second part of the paper we present a very fast approximation algorithm for this problem. We prove that for arbitrarily given ε,δ>0 it calculates a solution with contribution at most (1+ε) times the minimal contribution with probability at least (1-δ). Though it cannot run in polynomial time for all instances, it performs extremely fast on various benchmark datasets. The algorithm solves very large problem instances which are intractable for exact algorithms (e.g., 10000 solutions in 100 dimensions) within a few seconds.
Runtime analysis of evolutionary algorithms has become an important part in the theoretical analysis of randomized search heuristics. After initial investigations on pseudo-Boolean functions, combinatorial optimization problems are now being studied. The first combinatorial problem where rigorous runtime results have been achieved is the well-known single source shortest path (SSSP) problem. Scharnow, Tinnefeld and Wegener [PPSN 2002, J. Math. Model. Alg. 2004] proposed a multi-objective approach which solves the problem in expected polynomial time. They also suggest a related single-objective fitness function. However, it was left open whether this does solve the problem efficiently, and, in a broader context, whether multi-objective fitness functions for problems like the SSSP yield more efficient evolutionary algorithms.
In this paper, we show that the single objective approach yields an efficient (1+1) EA with runtime bounds very close to those of the multi-objective approach.
We empirically analyze two versions of the well-known “randomized rumor spreading” protocol to disseminate a piece of information in networks. In the classical model, in each round each informed node informs a random neighbor. At SODA 2008, three of the authors proposed a quasirandom variant. Here, each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on informs its neighbors in the order of the list.
While for sparse random graphs a better performance of the quasirandom model could be proven, all other results show that, independent of the structure of the lists, the same asymptotic performance guarantees hold as for the classical model.
In this work, we compare the two models experimentally. This not only shows that the quasirandom model generally is faster (which was expected, though maybe not to this extent), but also that the runtime is more concentrated around the mean value (which is surprising given that much fewer random bits are used in the quasirandom process).
These advantages are also observed in a lossy communication model, where each transmission does not reach its target with a certain probability, and in an asynchronous model, where nodes send at random times drawn from an exponential distribution. We also show that the particular structure of the lists has little influence on the efficiency. In particular, there is no problem if all nodes use an identical order to inform their neighbors.
We consider the computation of the volume of the union of high-dimensional geometric objects. While showing that this problem is #P-hard already for very simple bodies (i.e., axis-parallel boxes), we give a fast FPRAS for all objects where one can: (1) test whether a given point lies inside the object, (2) sample a point uniformly, (3) calculate the volume of the object in polynomial time. All three oracles can be weak, that is, just approximate. This implies that Klee's measure problem and the hypervolume indicator can be approximated efficiently even though they are #P-hard and hence cannot be solved exactly in time polynomial in the number of dimensions unless P=NP. Our algorithm also allows to approximate efficiently the volume of the union of convex bodies given by weak membership oracles.
For the analogous problem of the intersection of high-dimensional geometric objects we prove #P-hardness for boxes and show that there is no multiplicative polynomial-time 2d1-ε-approximation for certain boxes unless NP=BPP, but give a simple additive polynomial-time ε-approximation.
We propose and analyse a quasirandom analogue to the classical push model for disseminating information in networks (“randomized rumor spreading”).
In the classical model, in each round each informed node chooses a neighbor at random and informs it. Results of Frieze and Grimmett (Discrete Appl. Math. 1985) show that this simple protocol succeeds in spreading a rumor from one node of a complete graph to all others within O(log n) rounds. For the network being a hypercube or a random graph G(n,p) with p ≥ (1 + ε) (log n)/n, also O(log n) rounds suffice (Feige, Peleg, Raghavan, and Upfal, Random Struct. Algorithms 1990).
In the quasirandom model, we assume that each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on informs its neighbors in the order of the list. Surprisingly, irrespective of the orders of the lists, the above mentioned bounds still hold. In addition, we also show a O(log n) bound for sparsely connected random graphs G(n,p) with p=(log n + f(n))/n, where f(n) --> ∞ and f(n)=O(log log n). Here, the classical model needs Θ(log2(n)) rounds.
Hence the quasirandom model achieves similar or better broadcasting times with a greatly reduced use of random bits.
Jim Propp's rotor router model is a deterministic analogue of a random walk on a graph. Instead of distributing chips randomly, each vertex serves its neighbors in a fixed order.
Cooper and Spencer (Comb. Probab. Comput. (2006)) show a remarkable similarity of both models. If an (almost) arbitrary population of chips is placed on the vertices of a grid Zd and does a simultaneous walk in the Propp model, then at all times and on each vertex, the number of chips deviates from the expected number the random walk would have gotten there, by at most a constant. This constant is independent of the starting configuration and the order in which each vertex serves its neighbors.
This result raises the question if all graphs do have this property. With quite some effort, we are now able to answer this question negatively. For the graph being an infinite k-ary tree (k≥3), we show that for any deviation D there is an initial configuration of chips such that after running the Propp model for a certain time there is a vertex with at least D more chips than expected in the random walk model. However, to achieve a deviation of D it is necessary that at least kΘ(D) vertices contribute by being occupied by a number of chips not divisible by k in a certain time interval.
We show several ways to round a real matrix to an integer one such that the rounding errors in all rows and columns as well as the whole matrix are less than one. This is a classical problem with applications in many fields, in particular, statistics.
We improve earlier solutions of different authors in two ways. For rounding matrices of size m*n, we reduce the runtime from O((mn)2) to O(mn log(mn)). Second, our roundings also have a rounding error of less than one in all initial intervals of rows and columns. Consequently, arbitrary intervals have an error of at most two. This is particularly useful in the statistics application of controlled rounding.
The same result can be obtained via (dependent) randomized rounding. This has the additional advantage that the rounding is unbiased, that is, for all entries yij of our rounding, we have E(yij)=xij, where xij is the corresponding entry of the input matrix.
We examine the complexity of constraint satisfaction problems that consist of a set of AllDiff constraints. Such CSPs naturally model a wide range of real-world and combinatorial problems, like scheduling, frequency allocations, and graph coloring problems. As this problem is known to be NP-complete, we investigate under which further assumptions it becomes tractable. We observe that a crucial property seems to be the convexity of the variable domains and constraints. Our main contribution is an extensive study of the complexity of Multiple AllDiff CSPs for a set of natural parameters, like maximum domain size and maximum size of the constraint scopes. We show that, depending on the parameter, convexity can make the problem tractable even though it is provably intractable in general. Interestingly, the convexity of constraints is the key property in achieving fixed parameter tractability, while the convexity of domains does not usually make the problem easier.
A random geometric graph (RGG) is defined by placing n points uniformly at random in the d-dimensional space [0,n1/d]d, and joining two points by an edge whenever their Euclidean distance is at most some fixed r. We assume that r is larger than the critical value for the emergence of a connected component with Ω(n) nodes. We show that, with high probability (w.h.p.), for any two connected nodes with a minimum Euclidean distance of ω(log n/rd-1), their graph distance is only a constant factor larger than their Euclidean distance. This implies that the diameter of the largest connected component is Θ(n1/d/r ) w.h.p. We also prove that the condition on the Euclidean distance above is essentially tight.
We also analyze the following randomized broadcast algorithm on RGGs. At the beginning, only one node from the largest connected component of the RGG is informed. Then, in each round, each informed node chooses a neighbor independently and uniformly at random and informs it. We prove that w.h.p. this algorithm informs every node in the largest connected component of an RGG within Θ(n1/d/r+log n).
In order to allow a comparison of (otherwise incomparable) sets, many evolutionary multiobjective optimizers use indicator functions to guide the search and to evaluate the performance of search algorithms. The most widely used indicator is the hypervolume indicator. It measures the volume of the dominated portion of the objective space bounded from below by a reference point. Though the hypervolume indicator is very popular, it has not been shown that maximizing the hypervolume indicator of sets of bounded size is indeed equivalent to the overall objective of finding a good approximation of the Pareto front. To address this question, we compare the optimal approximation ratio with the approximation ratio achieved by two-dimensional sets maximizing the hypervolume indicator. We bound the optimal multiplicative approximation ratio of n points by 1+Θ(1/n) for arbitrary Pareto fronts. Furthermore, we prove that the same asymptotic approximation ratio is achieved by sets of n points that maximize the hypervolume indicator. However, there is a provable gap between the two approximation ratios which is even exponential in the ratio between the largest and the smallest value of the front. We also examine the additive approximation ratio of the hypervolume indicator in two dimensions and prove that it achieves the optimal additive approximation ratio apart from a small ratio ≤ n/(n-2), where n is the size of the population. Hence the hypervolume indicator can be used to achieve a good additive but not a good multiplicative approximation of a Pareto front. This motivates the introduction of a “logarithmic hypervolume indicator” which provably achieves a good multiplicative approximation ratio.
We propose a simple distributed algorithm for balancing indivisible tokens on graphs. The algorithm is completely deterministic, though it tries to imitate (and enhance) a random algorithm by keeping the accumulated rounding errors as small as possible.
Our new algorithm approximates the idealized process (where the tokens are divisible) on important network topologies surprisingly closely. On d-dimensional torus graphs with n nodes it deviates from the idealized process only by an additive constant. In contrast to that, the randomized rounding approach of Friedrich and Sauerwald (2009) can deviate up to Ω(polylog n) and the deterministic algorithm of Rabani, Sinclair, and Wanka (1998) has a deviation of Ω(n1/d). This makes our quasirandom algorithm the first known algorithm for this setting which is optimal both in time and achieved smoothness. We further show that also on the hypercube our algorithm has a smaller deviation from the idealized process than the previous algorithms.
To prove these results, we derive several combinatorial and probabilistic results that we believe to be of independent interest. In particular, we show that first-passage probabilities of a random walk on a path with arbitrary weights can be expressed as a convolution of independent geometric probability distributions.
Evolutionary algorithms have been widely used to tackle multi-objective optimization problems. Incorporating preference information into the search of evolutionary algorithms for multi-objective optimization is of great importance as it allows to focus on interesting regions in the objective space. Zitzler et al. have shown how to use a weight distribution function on the objective space to incorporate preference information into hypervolume-based algorithms. We show that this weighted information can easily be used in other popular EMO algorithms as well. Our results for NSGA-II and SPEA2 show that this yields similar results to the hypervolume approach and requires less computational effort.
Systems biology has emerged over the last decade. Driven by the advances in sophisticated measurement technology the research community generated huge molecular biology data sets. This comprises rather static data on the interplay of biological entities, for instance protein-protein interaction network data, as well as quite dynamic data collected for studying the behavior of individual cells or tissues in accordance to changing environmental conditions, such as DNA microarrays or RNA sequencing. Here we bring the two different data types together in order to gain higher level knowledge. We introduce a significantly improved version of the KeyPathwayMiner software framework. Given a biological network modelled as graph and a set of expression studies, KeyPathwayMiner efficiently finds and visualizes connected sub-networks where most components are expressed in most cases. It finds all maximal connected sub-networks where all nodes but k exceptions are expressed in all experimental studies but at least l exceptions. We demonstrate the power of the new approach by comparing it to similar approaches with gene expression data previously used to study Huntington's disease. In addition, we demonstrate KeyPathwayMiner's flexibility and applicability to non-array data by analyzing genome-scale DNA methylation profiles from colorectal tumor cancer patients. KeyPathwayMiner release 2 is available as a Cytoscape plugin and online at keypathwayminer.mpi-inf.mpg.de.
Understanding structural and algorithmic properties of complex networks is an important task, not least because of the huge impact of the internet. Our focus is to analyze how news spreads in social networks. We simulate a simple information spreading process in different network topologies and demonstrate that news spreads much faster in existing social network topologies. We support this finding by analyzing information spreading in the mathematically defined preferential attachment network topology, which is a common model for real-world networks. We prove that here a sublogarithmic time suffices to spread a news to all nodes of the network. All previously studied network topologies need at least a logarithmic time. Surprisingly, we observe that nodes with few neighbors are crucial for the fast dissemination.
We give an algorithm that computes the final state of certain growth models without computing all intermediate states. Our technique is based on a “least action principle” which characterizes the odometer function of the growth process. Starting from an educated guess for the odometer, we successively correct under- and overestimates and provably arrive at the correct final state. The degree of speedup depends on the accuracy of the initial guess.
Determining the size of the boundary fluctuations in growth models like internal diffusion-limited aggregation (IDLA) is a long-standing open problem in statistical physics. As an application of our method, we calculate the size of fluctuations over two orders of magnitude beyond previous simulations. Our data strongly support the conjecture that the fluctuations of IDLA are logarithmic in the radius.
We empirically analyze two versions of the well-known “randomized rumor spreading” protocol to disseminate a piece of information in networks. In the classical model, in each round each informed node informs a random neighbor. In the recently proposed quasirandom variant, each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on informs its neighbors in the order of the list.
While for sparse random graphs a better performance of the quasirandom model could be proven, all other results show that, independent of the structure of the lists, the same asymptotic performance guarantees hold as for the classical model.
In this work, we compare the two models experimentally. This not only shows that the quasirandom model generally is faster, but also that the runtime is more concentrated around the mean. This is surprising given that much fewer random bits are used in the quasirandom process.
These advantages are also observed in a lossy communication model, where each transmission does not reach its target with a certain probability, and in an asynchronous model, where nodes send at random times drawn from an exponential distribution. We also show that typically the particular structure of the lists has little influence on the efficiency.
The hypervolume indicator is an increasingly popular set measure to compare the quality of two Pareto sets. The basic ingredient of most hypervolume indicator based optimization algorithms is the calculation of the hypervolume contribution of single solutions regarding a Pareto set. We show that exact calculation of the hypervolume contribution is #P-hard while its approximation is NP-hard. The same holds for the calculation of the minimal contribution. We also prove that it is NP-hard to decide whether a solution has the least hypervolume contribution. Even deciding whether the contribution of a solution is at most (1+ε) times the minimal contribution is NP-hard. This implies that it is neither possible to efficiently find the least contributing solution (unless P=NP) nor to approximate it (unless NP=BPP).
Nevertheless, in the second part of the paper we present a fast approximation algorithm for this problem. We prove that for arbitrarily given ε,δ>0 it calculates a solution with contribution at most (1+ε) times the minimal contribution with probability at least (1-δ). Though it cannot run in polynomial time for all instances, it performs extremely fast on various benchmark datasets. The algorithm solves very large problem instances which are intractable for exact algorithms (e.g., 10000 solutions in 100 dimensions) within a few seconds.
It is widely assumed that evolutionary algorithms for multi-objective optimization problems should use certain mechanisms to achieve a good spread over the Pareto front. In this paper, we examine such mechanisms from a theoretical point of view and analyze simple algorithms incorporating the concept of fairness. This mechanism tries to balance the number of offspring of all individuals in the current population. We rigorously analyze the runtime behavior of different fairness mechanisms and present showcase examples to point out situations, where the right mechanism can speed up the optimization process significantly. We also indicate drawbacks for the use of fairness by presenting instances, where the optimization process is slowed down drastically.
The hypervolume indicator serves as a sorting criterion in many recent multi-objective evolutionary algorithms (MOEAs). Typical algorithms remove the solution with the smallest loss with respect to the dominated hypervolume from the population. We present a new algorithm which determines for a population of size n with d objectives, a solution with minimal hypervolume contribution in time O(nd/2 log n) for d≥2. This improves all previously published algorithms by a factor of n for all d≥3 and by a factor of √n for d=3.
We also analyze hypervolume indicator based optimization algorithms which remove λ≥1 solutions from a population of size n=μ+λ. We show that there are populations such that the hypervolume contribution of iteratively chosen λ solutions is much larger than the hypervolume contribution of an optimal set of λ solutions. Selecting the optimal set of λ solutions implies calculating binomial(n,μ) conventional hypervolume contributions, which is considered to be computationally too expensive. We present the first hypervolume algorithm which calculates directly the contribution of every set of λ solutions. This gives an additive term of binomial(n,μ) in the runtime of the calculation instead of a multiplicative factor of binomial(n,μ). More precisely, for a population of size n with d objectives, our algorithm can calculate a set of λ solutions with minimal hypervolume contribution in time O(nd/2 log n + nλ) for d≥2. This improves all previously published algorithms by a factor of nmin(λ,d/2) for d≥3 and by a factor of n for d=3.
We consider the computation of the volume of the union of high-dimensional geometric objects. While showing that this problem is #P-hard already for very simple bodies, we give a fast FPRAS for all objects where one can (1) test whether a given point lies inside the object, (2) sample a point uniformly, and (3) calculate the volume of the object in polynomial time. It suffices to be able to answer all three questions approximately. We show that this holds for a large class of objects. It implies that Klee's measure problem can be approximated efficiently even though it is #P-hard and hence cannot be solved exactly in time polynomial in the number of dimensions unless P=NP. Our algorithm also allows to efficiently approximate the volume of the union of convex bodies given by weak membership oracles.
For the analogous problem of the intersection of high-dimensional geometric objects we prove #P-hardness for boxes and show that there is no multiplicative polynomial-time 2d1-ε-approximation for certain boxes unless NP=BPP, but give a simple additive polynomial-time ε-approximation.
Jim Propp's rotor router model is a deterministic analogue of a random walk on a graph. Instead of distributing chips randomly, each vertex serves its neighbors in a fixed order.
Cooper and Spencer (Comb. Probab. Comput. (2006)) show a remarkable similarity of both models. If an (almost) arbitrary population of chips is placed on the vertices of a grid Zd and does a simultaneous walk in the Propp model, then at all times and on each vertex, the number of chips on this vertex deviates from the expected number the random walk would have gotten there by at most a constant. This constant is independent of the starting configuration and the order in which each vertex serves its neighbors.
This result raises the question if all graphs do have this property. With quite some effort, we are now able to answer this question negatively. For the graph being an infinite k-ary tree (k≥3), we show that for any deviation D there is an initial configuration of chips such that after running the Propp model for a certain time there is a vertex with at least D more chips than expected in the random walk model. However, to achieve a deviation of D it is necessary that at least exp(Ω(D2)) vertices contribute by being occupied by a number of chips not divisible by k at a certain time.