Breadth First Search (BFS) plays a key role in computational science, networking, and artificial intelligence applications. Although the BFS approach has been extensively studied, particularly in its direction-optimized form, existing implementations still present three main issues: (1) high memory footprint; (2) the under-realized lightweight representations using bitmaps; and (3) the underuse of modern hardware such as Tensor Cores Units (TCUs).

In this paper, we propose BerryBees, an efficient algebraic BFS algorithm that leverages the Matrix Multiply-Accumulate (MMA) instructions of TCUs. The novelty of BerryBees lies in (1) the Binarized Row Slice (BRS) format, which encodes the adjacency matrices by using bitmaps to represent non-empty row segments; and (2) a warp-level algorithm that leverages TCUs for accelerating both SpMV and SpMSpV operations for enhanced BFS performance. The experimental results on the three latest NVIDIA GPU architectures show that BerryBees outperforms four state-of-the-art BFS methods: GAP, Gunrock, Enterprise and GraphBLAST, and delivers average speedups of 1.42 x, 1.97x, 5.05x, and 3.74x (up to 9.99x, 13.66x, 114.07x, and 24.74x), respectively.

Tue 4 Mar

Displayed time zone: Pacific Time (US & Canada) change

14:00 - 15:20
Session 8: Tensor Cores (Session Chair: Jeffrey Vetter)Main Conference at Acacia D
14:00
20m
Talk
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores
Main Conference
Jinliang Shi Beijing University of Posts and Telecommunications, Shigang Li Beijing University of Posts and Telecommunications, Youxuan Xu Beijing University of Posts and Telecommunications, Rongtian Fu Beijing University of Posts and Telecommunications, Xueying Wang Beijing University of Posts and Telecommunications, Tong Wu Beijing University of Posts and Telecommunications
14:20
20m
Talk
Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
Main Conference
Haisha Zhao Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Li San Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Jiaheng Wang Renmin University of China, Chunbao Zhou Computer Network Information Center, Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences, Zhikuang Xin Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, lishunde Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, ZhiQiang Liang Computer Network Information Center, Chinese Academy of Sciences, Zhijie Pan Hangzhou Dianzi University, Fang Liu Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Yan Zeng Hangzhou Dianzi University, Yangang Wang Computer Network Information Center, Chinese Academy of Sciences, Xuebin Chi Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences
14:40
20m
Talk
BerryBees: Breadth First Search by Bit-Tensor-CoresDistinguished Paper AwardBest Artifact Award
Main Conference
Yuyao Niu Barcelona Supercomputing Center (BSC) - Universitat Politècnica de Catalunya (UPC), Marc Casas Barcelona Supercomputing Center
15:00
20m
Talk
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units
Main Conference
Haozhi Han Microsoft Research; Peking University, Kun Li Microsoft Research, Wei Cui Microsoft Research, Donglin Bai Microsoft Research, Yiwei Zhang UCAS; Microsoft Research, Liang Yuan Chinese Academy of Sciences, Yifeng Cheng Peking University, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research