GLUMIN: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining
Graph Pattern Mining (GPM) has made significant progress in recent years, opening up a range of big data applications. However, GPM applications are memory-intensive as they require a tremendous amount of pair checking, which involves enumerating all possible vertex pairs and checking their connectivity or counting the common neighbors of each pair. Adjacent list or CSR format is widely used in recent GPM systems to overcome the sparsity of real-world graph, at the cost of sacrificing the ability to query the connectivity of two vertices in 𝑂 (1) compared with raw adjacent matrix. In this paper, we rethink the format to store graph in GPM systems, by partial ‘restoring’ graph from CSR to adjacent matrix, we can significantly accelerate checking connectivity or counting common neighbors during searching process. However, directly storing graph in adjacent matrix is cost-prohibitive, thus we propose GLUMIN, by selectively constructing lookup table (LUT) at the runtime, which holds the connectivity that can be repeatedly retrieved at low-cost. The LUTs are compressed and tailored to fit the hierarchical memory architecture like GPU shared memory or CPU cache. We adapt GLUMIN technique to several SOTA GPM systems like AutoMine, G2Miner and GraphFold, bring two order of magnitude speedup to them on 135 different types graphs and 24 patterns, demonstrating the general applicability and significance of our methods.