Several strategies are proposed to parallelize graph neural network (GNN) training over multiple GPUs. These strategies are logically equivalent (i.e., produce the same model) but partition data and assign computation to the GPUs in different ways. We observe that there is no consistent winner (i.e., with the shortest running time), and the optimal strategy depends on the graph dataset, GNN model, training algorithm, and hardware configurations. As such, we design the APT system to automatically select efficient parallelization strategies for GNN training tasks. In particular, we analyze the trade-offs of the strategies and design simple yet effective cost models to compare their execution time and facilitate strategy selection. Moreover, we also propose a general abstraction of the strategies, which allows to implement a unified execution engine that can be configured to run different strategies. Our experiments show that APT runs the optimal strategy across various task configurations, and the training time can be reduced by over 2x compared with always using a single strategy. APT is open-source anonymously at https://anonymous.4open.science/r/APT-1CAB.
Mon 3 MarDisplayed time zone: Pacific Time (US & Canada) change
10:00 - 11:00 | |||
10:00 20mTalk | Helios: Efficient Distributed Dynamic Graph Sampling for Online GNN Inference Main Conference Jie Sun Zhejiang University, Zuocheng Shi Zhejiang University, Li Su Alibaba Group, Wenting Shen Alibaba Group, Zeke Wang Zhejiang University, Yong Li Alibaba Group, Wenyuan Yu Alibaba Group, Wei Lin Alibaba Group, Fei Wu College of Computer Science and Technology in Zhejiang University, Jingren Zhou Alibaba Group, Bingsheng He National University of Singapore | ||
10:20 20mTalk | Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering Main Conference Jou-An Chen North Carolina State University, Hsin-Hsuan Sung North Carolina State University, Ruifeng Zhang North Carolina State University, Ang Li Pacific Northwest National Laboratory, Xipeng Shen North Carolina State University | ||
10:40 20mTalk | Adaptive Parallel Training for Graph Neural Networks Main Conference Kaihao Ma The Chinese University of Hong Kong, Renjie Liu Southern University of Science and Technology, Xiao Yan Centre for Perceptual and Interactive Intelligence (CPII), Zhenkun Cai Amazon, Xiang Song Amazon Web Services, Minjie Wang Amazon Web Services, Yichao Li The Chinese University of Hong Kong, James Cheng The Chinese University of Hong Kong |