Search events for 'all'
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers
Main Conference When: Wed 5 Mar 2025 11:40 - 12:00 People: Yiwei Zhang, Kun Li, Liang Yuan, Haozhi Han, Yunquan Zhang, Ting Cao, Mao Yang
… caused by data alignment conflicts (DAC) has not been effectively resolved in all … to reduce DAC across all dimensions by tessellating swizzled finest-grained lanes …
Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid
Main Conference When: Wed 5 Mar 2025 12:00 - 12:20 People: Yi Zong, Chensong Zhang, Longjiang Mu, Jianchun Wang, Jian Sun, Xiaowen Xu, Xinliang Wang, Peinan Yu, Wei Xue
… the fastest time-to-solution across all cases, with average speedups of 5.97x, 15.2x …-StructMG significantly improves both strong and weak scaling efficiencies in all …
Aggregating Funnels for Faster Fetch&Add and Queues
Main Conference When: Mon 3 Mar 2025 14:20 - 14:40 People: Younghun Roh, Yuanhao Wei, Eric Ruppert, Panagiota Fatourou, Siddhartha Jayanti, Julian Shun
… be performed by a single hardware fetch-and-add instruction on one location and all …
ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
Main Conference When: Tue 4 Mar 2025 10:40 - 11:00 People: Yuhang Liang, Xinyi Li, Jie Ren, Ang Li, Bo Fang, Jieyang Chen
… all extreme errors. Compared with the state-of-the-art checkpoint/restore …
GLUMIN: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining
Main Conference When: Wed 5 Mar 2025 10:40 - 11:00 People: Weichen Cao, Ke Meng, linzhiheng , Guangming Tan
… enumerating all possible vertex pairs and checking their connectivity or counting …
COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers
Main Conference When: Mon 3 Mar 2025 17:40 - 18:00 People: Baixi Sun, Weijin Liu, J. Gregory Pauloski, Jiannan Tian, Jinda Jia, Daoce Wang, Boyuan Zhang, Mingkai Zheng, Sheng Di, Sian Jin, Zhao Zhang, Xiaodong Yu, Kamil A. Iskra, Pete Beckman, Guangming Tan, Dingwen Tao
… communication time by 14.2$\times$, and improves overall performance by 1.8$\times$, all …
POSTER: Big Atomics and Fast Concurrent Hash Tables
Main Conference People: Daniel Anderson, Guy E. Blelloch, Siddhartha Jayanti
… approach is close to the fastest under all conditions and far outperforms others …
POSTER: Minimizing speculation overhead in a parallel recognizer for regular texts
Main Conference People: Angelo Borsotti, Luca Breveglieri, Stefano Crespi Reghizzi, Angelo Morzenti
… -based one on all benchmarks, while it performs as well as the DFA-based one on some …