Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers
Stencil computation plays a pivotal role in numerous scientific and engineering applications. Previous studies have extensively investigated vectorization techniques to enhance in-core parallelism; however, the performance bottleneck caused by data alignment conflicts (DAC) has not been effectively resolved in all dimensions. This paper proposes Jigsaw, a conflict-free vectorization method to reduce DAC across all dimensions by tessellating swizzled finest-grained lanes. Jigsaw comprises three key components: Lane-based Butterfly Vectorization, SVD-based Dimension Flattening, and Iteration-based Temporal Merging. These components effectively address DAC across spatial and temporal dimensions. Experimental results on different machines demonstrate that Jigsaw could achieve a significant improvement compared to the state-of-the-art techniques, with an average speedup of 2.31x on various stencil kernels.
Wed 5 MarDisplayed time zone: Pacific Time (US & Canada) change
11:40 - 13:00 | Session 11: Parallel Algorithms and Applications (Session Chair: Weicong Chen)Main Conference at Acacia D | ||
11:40 20mTalk | Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers Main Conference Yiwei Zhang UCAS; Microsoft Research, Kun Li Microsoft Research, Liang Yuan Chinese Academy of Sciences, Haozhi Han Microsoft Research; Peking University, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research | ||
12:00 20mTalk | Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid Main Conference Yi Zong Tsinghua University, Chensong Zhang Academy of Mathematics and Systems Science, Longjiang Mu Laoshan Laboratory, Jianchun Wang China Ship Scientific Research Center, Jian Sun CMA Earth System Modeling and Prediction Center, Xiaowen Xu Institute of Applied Physics and Computational Mathematics, Xinliang Wang Huawei Technologies Co., Ltd, Peinan Yu Tsinghua University, Wei Xue Tsinghua University | ||
12:20 20mTalk | SBMGT: Scaling Bayesian Multinomial Group Testing Main Conference Weicong Chen University of California, Merced, Hao Qi University of California, Merced, Curtis Tatsuoka University of Pittsburgh, Xiaoyi Lu UC Merced | ||
12:40 20mTalk | An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores Main Conference Xiaohui Duan Shandong University, Yi Zhang PIESAT Information Technology,Co. Ltd., Kai Xu Laoshan Laboratory, Haohuan Fu Tsinghua University, Bin Yang Tianjin University, Yiming Wang PIESAT Information Technology,Co. Ltd., Yilun Han Tsinghua University, Siyuan Chen PIESAT Information Technology,Co. Ltd., Zhuangzhuang Zhou National Supercomputing Center in Wuxi, Chenyu Wang National Supercomputing Center in Wuxi, Dongqiang Huang National Supercomputing Center in Wuxi, Huihai An Shandong University, Xiting Ju Tsinghua University, Haopeng Huang Tsinghua University, Zhuang Liu Tsinghua University, Wei Xue Tsinghua, Weiguo Liu Shandong University, Bowen Yan Tsinghua University, Jianye Hou The Chinese University of Hong Kong, Maoxue Yu Laoshan Laboratory, Wenguang Chen Tsinghua University; Pengcheng Laboratory, Jian Li Chinese Academy of Meteorological Sciences, Zhao Jing Laoshan Laboratory, Hailong Liu Laoshan Laboratory, Lixin Wu Laoshan Laboratory |