BerryBees: Breadth First Search by Bit-Tensor-CoresDistinguished Paper AwardBest Artifact Award
Breadth First Search (BFS) plays a key role in computational science, networking, and artificial intelligence applications. Although the BFS approach has been extensively studied, particularly in its direction-optimized form, existing implementations still present three main issues: (1) high memory footprint; (2) the under-realized lightweight representations using bitmaps; and (3) the underuse of modern hardware such as Tensor Cores Units (TCUs).
In this paper, we propose BerryBees, an efficient algebraic BFS algorithm that leverages the Matrix Multiply-Accumulate (MMA) instructions of TCUs. The novelty of BerryBees lies in (1) the Binarized Row Slice (BRS) format, which encodes the adjacency matrices by using bitmaps to represent non-empty row segments; and (2) a warp-level algorithm that leverages TCUs for accelerating both SpMV and SpMSpV operations for enhanced BFS performance. The experimental results on the three latest NVIDIA GPU architectures show that BerryBees outperforms four state-of-the-art BFS methods: GAP, Gunrock, Enterprise and GraphBLAST, and delivers average speedups of 1.42 x, 1.97x, 5.05x, and 3.74x (up to 9.99x, 13.66x, 114.07x, and 24.74x), respectively.
Tue 4 MarDisplayed time zone: Pacific Time (US & Canada) change
14:00 - 15:20 | |||
14:00 20mTalk | FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores Main Conference Jinliang Shi Beijing University of Posts and Telecommunications, Shigang Li Beijing University of Posts and Telecommunications, Youxuan Xu Beijing University of Posts and Telecommunications, Rongtian Fu Beijing University of Posts and Telecommunications, Xueying Wang Beijing University of Posts and Telecommunications, Tong Wu Beijing University of Posts and Telecommunications | ||
14:20 20mTalk | Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores Main Conference Haisha Zhao Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Li San Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Jiaheng Wang Renmin University of China, Chunbao Zhou Computer Network Information Center, Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences, Zhikuang Xin Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, lishunde Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, ZhiQiang Liang Computer Network Information Center, Chinese Academy of Sciences, Zhijie Pan Hangzhou Dianzi University, Fang Liu Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Yan Zeng Hangzhou Dianzi University, Yangang Wang Computer Network Information Center, Chinese Academy of Sciences, Xuebin Chi Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences | ||
14:40 20mTalk | BerryBees: Breadth First Search by Bit-Tensor-CoresDistinguished Paper AwardBest Artifact Award Main Conference Yuyao Niu Barcelona Supercomputing Center (BSC) - Universitat Politècnica de Catalunya (UPC), Marc Casas Barcelona Supercomputing Center | ||
15:00 20mTalk | FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units Main Conference Haozhi Han Microsoft Research; Peking University, Kun Li Microsoft Research, Wei Cui Microsoft Research, Donglin Bai Microsoft Research, Yiwei Zhang UCAS; Microsoft Research, Liang Yuan Chinese Academy of Sciences, Yifeng Cheng Peking University, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research |