The Louvain algorithm is one of the most popular algorithms for community detection. Observing that existing implementations suffer from inaccurate pruning and inefficient intermediate state management, we introduce GALA, \underline{G}PU-\underline{A}ccelerated \underline{L}ouvain \underline{A}lgorithm, which incorporates two key innovations. The first innovation is a novel modularity gain-based pruning strategy, supported by rigorous theoretical guarantees of optimality and able to reduce up to 76% of vertices as well as their corresponding computations. To take advantage of the memory hierarchy and parallelism of GPUs, the second innovation is workload-aware kernels, featuring a shuffle-based kernel founded on the warp-level primitives for exchange states and a hash-based kernel that prioritizes shared memory in hashtable design. GALA further scales to multiple GPUs by minimizing the synchronization overhead between GPUs through a dense-sparse synchronization strategy. We evaluate the performance of GALA through theoretical analysis and practical experiments on various real-world graphs. The experimental results confirm that GALA significantly improves the performance of the parallel Louvain algorithm on GPUs, surpassing state-of-the-art solutions by $6\times$ on average.

Wed 5 Mar

Displayed time zone: Pacific Time (US & Canada) change

10:00 - 11:20
Session 10: GPU II (Session Chair: Zhijia Zhao)Main Conference at Acacia D
10:00
20m
Talk
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra
Main Conference
Julian Bellavita Cornell University, Thomas Pasquali University of Trento, Laura Del Rio University of Trento, Flavio Vella Free University of Bozen, Giulia Guidi Cornell University
10:20
20m
Talk
Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm
Main Conference
Zhibin Wang Nanjing University, Xi Lin Nanjing University, Xue Li Alibaba Group, Pinhuan Wang Rutgers, The State University of New Jersey, Ziheng Meng Nanjing University, Hang Liu Rutgers, The State University of New Jersey, Chen Tian Nanjing University, Sheng Zhong Nanjing University
10:40
20m
Talk
GLUMIN: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining
Main Conference
Weichen Cao Institute of Computing Technology, Chinese Academy of Sciences, Ke Meng Chinese Academy of Sciences, linzhiheng Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS)
11:00
20m
Talk
Improving Tridiagonalization Performance on GPU Architectures
Main Conference
WangHansheng University of Electronic Science and Technology of China, Zhekai Duan University of Edinburgh, Zitian Zhao University of Electronic Science and Technology of China, Siqi Wu University of Electronic Science and Technology of China, Saiqi Zheng Xi'an Jiaotong-Liverpool University, Qiao Li University of Electronic Science and Technology of China, Xu Jiang University of Electronic Science and Technology of China, Shaoshuai Zhang