Stencil computation plays a pivotal role in numerous scientific and engineering applications. Previous studies have extensively investigated vectorization techniques to enhance in-core parallelism; however, the performance bottleneck caused by data alignment conflicts (DAC) has not been effectively resolved in all dimensions. This paper proposes Jigsaw, a conflict-free vectorization method to reduce DAC across all dimensions by tessellating swizzled finest-grained lanes. Jigsaw comprises three key components: Lane-based Butterfly Vectorization, SVD-based Dimension Flattening, and Iteration-based Temporal Merging. These components effectively address DAC across spatial and temporal dimensions. Experimental results on different machines demonstrate that Jigsaw could achieve a significant improvement compared to the state-of-the-art techniques, with an average speedup of 2.31x on various stencil kernels.

Wed 5 Mar

Displayed time zone: Pacific Time (US & Canada) change

11:40 - 13:00
Session 11: Parallel Algorithms and Applications (Session Chair: Weicong Chen)Main Conference at Acacia D
11:40
20m
Talk
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers
Main Conference
Yiwei Zhang UCAS; Microsoft Research, Kun Li Microsoft Research, Liang Yuan Chinese Academy of Sciences, Haozhi Han Microsoft Research; Peking University, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research
12:00
20m
Talk
Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid
Main Conference
Yi Zong Tsinghua University, Chensong Zhang Academy of Mathematics and Systems Science, Longjiang Mu Laoshan Laboratory, Jianchun Wang China Ship Scientific Research Center, Jian Sun CMA Earth System Modeling and Prediction Center, Xiaowen Xu Institute of Applied Physics and Computational Mathematics, Xinliang Wang Huawei Technologies Co., Ltd, Peinan Yu Tsinghua University, Wei Xue Tsinghua University
12:20
20m
Talk
SBMGT: Scaling Bayesian Multinomial Group Testing
Main Conference
Weicong Chen University of California, Merced, Hao Qi University of California, Merced, Curtis Tatsuoka University of Pittsburgh, Xiaoyi Lu UC Merced
12:40
20m
Talk
An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores
Main Conference
Xiaohui Duan Shandong University, Yi Zhang PIESAT Information Technology,Co. Ltd., Kai Xu Laoshan Laboratory, Haohuan Fu Tsinghua University, Bin Yang Tianjin University, Yiming Wang PIESAT Information Technology,Co. Ltd., Yilun Han Tsinghua University, Siyuan Chen PIESAT Information Technology,Co. Ltd., Zhuangzhuang Zhou National Supercomputing Center in Wuxi, Chenyu Wang National Supercomputing Center in Wuxi, Dongqiang Huang National Supercomputing Center in Wuxi, Huihai An Shandong University, Xiting Ju Tsinghua University, Haopeng Huang Tsinghua University, Zhuang Liu Tsinghua University, Wei Xue Tsinghua, Weiguo Liu Shandong University, Bowen Yan Tsinghua University, Jianye Hou The Chinese University of Hong Kong, Maoxue Yu Laoshan Laboratory, Wenguang Chen Tsinghua University; Pengcheng Laboratory, Jian Li Chinese Academy of Meteorological Sciences, Zhao Jing Laoshan Laboratory, Hailong Liu Laoshan Laboratory, Lixin Wu Laoshan Laboratory