Faster maximal clique enumeration in large real-world link streams

Read original: arXiv:2302.00360 - Published 5/27/2024 by Alexis Baudin, Cl'emence Magnien, Lionel Tabourier

📉

Overview

Link streams provide a model for representing interactions over time
This paper focuses on the problem of enumerating maximal cliques in link streams
Maximal cliques are sets of vertices that all interact pairwise during a full time interval
The paper proposes a new algorithm for solving this problem, building on the Bron-Kerbosch algorithm for enumerating maximal cliques in graphs

Plain English Explanation

Link streams offer a way to represent how things interact over time. A link stream consists of a series of links, each describing an interaction between two entities (like people or devices) during a certain time period.

The paper looks at the problem of finding all the "maximal cliques" in a link stream. A maximal clique is a set of entities that all interact with each other continuously over some time interval. For example, it could be a group of friends who are all in constant communication during a certain period.

The authors build on a well-known algorithm called Bron-Kerbosch that is used to find maximal cliques in regular graphs. They propose a new algorithm that takes this idea and adapts it to work specifically for link streams. Their algorithm is able to efficiently match up cliques in the "snapshots" of the network at each time point with the maximal cliques in the overall link stream.

The key innovation is that their algorithm is faster than previous state-of-the-art approaches, especially for very large link streams with millions of interactions. This could be useful for applications like analyzing social networks or detecting patterns in communication networks.

Technical Explanation

The authors build on the Bron-Kerbosch algorithm, a well-known method for enumerating maximal cliques in static graphs. They propose a new algorithm that matches the cliques found in the "instantaneous" graphs formed by the links present at each time point to the maximal cliques in the overall link stream.

The key steps of their approach are:

Construct the sequence of instantaneous graphs, one for each time point in the link stream.
Use Bron-Kerbosch to find the maximal cliques in each instantaneous graph.
Match up these cliques across time points to identify the maximal cliques in the full link stream.

The authors prove the validity of this approach and analyze its time complexity, showing it is better than previous state-of-the-art algorithms in many practical scenarios. They also study the "output-sensitive" complexity, which is closely tied to the size of the final output, demonstrating the efficiency of their method.

To validate their claims, the authors run experiments on both standard benchmark link stream datasets as well as massive real-world datasets with up to 100 million links. In all cases, their algorithm significantly outperforms existing techniques, often by at least an order of magnitude and up to 10,000x in some instances. This allows their approach to scale to link streams that were previously intractable for other methods.

Critical Analysis

The paper provides a thorough theoretical analysis and extensive experimental evaluation of the proposed algorithm. The authors acknowledge some limitations, such as the fact that their approach assumes the link stream is provided as input, rather than being generated dynamically.

One potential area for further research could be adapting the algorithm to work in more interactive or streaming settings, where the link stream is revealed incrementally over time. This could make the method more applicable to real-time applications like social network analysis or anomaly detection in communication networks.

Additionally, the authors focus solely on enumerating maximal cliques, but there may be other interesting properties of link streams that could be efficiently extracted using a similar decomposition-based approach. Exploring these other analysis tasks could broaden the applicability of the core techniques.

Overall, the paper presents a novel and highly efficient algorithm for a well-studied problem in link stream analysis. The strong theoretical guarantees and empirical performance improvements demonstrated in the work make it a valuable contribution to the field.

Conclusion

This paper introduces a new algorithm for enumerating maximal cliques in link streams, which model interactions over time. By building on the Bron-Kerbosch algorithm for static graphs and adapting it to the link stream setting, the authors are able to develop a method that significantly outperforms previous state-of-the-art approaches.

The key innovation is the ability to efficiently match up cliques found in the "snapshots" of the network at each time point with the true maximal cliques in the overall link stream. This allows the algorithm to scale to massive datasets with millions of interactions, opening up new possibilities for applications in areas like social network analysis and communication network monitoring.

While the paper focuses specifically on the maximal clique enumeration problem, the general decomposition-based approach could potentially be applied to other important link stream analysis tasks. Exploring these extensions and further improving the algorithm's capabilities in streaming or dynamic settings could be fruitful directions for future research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📉

Faster maximal clique enumeration in large real-world link streams

Alexis Baudin, Cl'emence Magnien, Lionel Tabourier

Link streams offer a good model for representing interactions over time. They consist of links $(b,e,u,v)$, where $u$ and $v$ are vertices interacting during the whole time interval $[b,e]$. In this paper, we deal with the problem of enumerating maximal cliques in link streams. A clique is a pair $(C,[t_0,t_1])$, where $C$ is a set of vertices that all interact pairwise during the full interval $[t_0,t_1]$. It is maximal when neither its set of vertices nor its time interval can be increased. Some of the main works solving this problem are based on the famous Bron-Kerbosch algorithm for enumerating maximal cliques in graphs. We take this idea as a starting point to propose a new algorithm which matches the cliques of the instantaneous graphs formed by links existing at a given time $t$ to the maximal cliques of the link stream. We prove its validity and compute its complexity, which is better than the state-of-the art ones in many cases of interest. We also study the output-sensitive complexity, which is close to the output size, thereby showing that our algorithm is efficient. To confirm this, we perform experiments on link streams used in the state of the art, and on massive link streams, up to 100 million links. In all cases our algorithm is faster, mostly by a factor of at least 10 and up to a factor of $10^4$. Moreover, it scales to massive link streams for which the existing algorithms are not able to provide the solution.

5/27/2024

Efficient Enumeration of Large Maximal k-Plexes

Qihao Cheng, Da Yan, Tianhao Wu, Lyuheng Yuan, Ji Cheng, Zhongyi Huang, Yang Zhou

Finding cohesive subgraphs in a large graph has many important applications, such as community detection and biological network analysis. Clique is often a too strict cohesive structure since communities or biological modules rarely form as cliques for various reasons such as data noise. Therefore, $k$-plex is introduced as a popular clique relaxation, which is a graph where every vertex is adjacent to all but at most $k$ vertices. In this paper, we propose a fast branch-and-bound algorithm as well as its task-based parallel version to enumerate all maximal $k$-plexes with at least $q$ vertices. Our algorithm adopts an effective search space partitioning approach that provides a lower time complexity, a new pivot vertex selection method that reduces candidate vertex size, an effective upper-bounding technique to prune useless branches, and three novel pruning techniques by vertex pairs. Our parallel algorithm uses a timeout mechanism to eliminate straggler tasks, and maximizes cache locality while ensuring load balancing. Extensive experiments show that compared with the state-of-the-art algorithms, our sequential and parallel algorithms enumerate large maximal $k$-plexes with up to $5 times$ and $18.9 times$ speedup, respectively. Ablation results also demonstrate that our pruning techniques bring up to $7 times$ speedup compared with our basic algorithm.

6/11/2024

🔍

A Fast Maximum Clique Algorithm Based on Network Decomposition for Large Sparse Networks

Tianlong Fan, Wenjun Jiang, Yi-Cheng Zhang, Linyuan Lu

Finding maximum cliques in large networks is a challenging combinatorial problem with many real-world applications. We present a fast algorithm to achieve the exact solution for the maximum clique problem in large sparse networks based on efficient graph decomposition. A bunch of effective techniques is being used to greatly prune the graph and a novel concept called Complete-Upper-Bound-Induced Subgraph (CUBIS) is proposed to ensure that the structures with the potential to form the maximum clique are retained in the process of graph decomposition. Our algorithm first pre-prunes peripheral nodes, subsequently, one or two small-scale CUBISs are constructed guided by the core number and current maximum clique size. Bron-Kerbosch search is performed on each CUBIS to find the maximum clique. Experiments on 50 empirical networks with a scale of up to 20 million show the CUBIS scales are largely independent of the original network scale. This enables an approximately linear runtime, making our algorithm amenable for large networks. Our work provides a new framework for effectively solving maximum clique problems on massive sparse graphs, which not only makes the graph scale no longer the bottleneck but also shows some light on solving other clique-related problems.

4/22/2024

BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs

Alexis Baudin, Cl'emence Magnien, Lionel Tabourier

Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.

5/27/2024