PPoPP is the premier forum for leading work on all aspects of parallel programming, including theoretical foundations, techniques, languages, compilers, runtime systems, tools, and practical experience. In the context of the symposium, “parallel programming” encompasses work on concurrent and parallel systems (multicore, multi-threaded, heterogeneous, clustered, and distributed systems; grids; datacenters; clouds; and large scale machines). Given the rise of parallel architectures in the consumer market (desktops, laptops, and mobile devices) and data centers, PPoPP is particularly interested in work that addresses new parallel workloads and issues that arise out of extreme-scale applications or cloud platforms, as well as techniques and tools that improve the productivity of parallel programming or work towards improved synergy with such emerging architectures.

Proceedings will be available in the ACM Digital Library.

Dates
Tracks
Plenary

This program is tentative and subject to change.

You're viewing the program in a time zone which is different from your device's time zone change time zone

Sat 1 Mar

Displayed time zone: Pacific Time (US & Canada) change

07:30 - 08:30
07:30
60m
Break
Breakfast
Catering

10:00 - 10:30
10:00
30m
Coffee break
Break
Catering

15:00 - 15:30
15:00
30m
Coffee break
Break
Catering

Sun 2 Mar

Displayed time zone: Pacific Time (US & Canada) change

07:30 - 08:30
07:30
60m
Break
Breakfast
Catering

10:00 - 10:30
10:00
30m
Coffee break
Break
Catering

15:00 - 15:30
15:00
30m
Coffee break
Break
Catering

18:00 - 20:00
Poster SessionMain Conference at Acacia C&D
18:00
2h
Poster
POSTER: A General and Scalable GCN Training Framework on CPU Supercomputers
Main Conference
Chen Zhuang Tokyo Institute of Technology, Riken Center for Computational Science, Peng Chen National Institute of Advanced Industrial Science and Technology, Xin Liu National Institute of Advanced Industrial Science & Technology, Rio Yokota Tokyo Institute of Technology, Nikoli Dryden Lawrence Livermore National Laboratory, Toshio Endo Tokyo Institute of Technology, Satoshi Matsuoka RIKEN, Mohamed Wahib RIKEN Center for Computational Science
18:00
2h
Poster
POSTER: Triangle Counting on Tensor Cores
Main Conference
YuAng Chen The Chinese University of Hong Kong, Jeffrey Xu Yu The Chinese University of Hong Kong
18:00
2h
Poster
POSTER: Minimizing speculation overhead in a parallel recognizer for regular texts
Main Conference
Angelo Borsotti Politecnico di Milano, Luca Breveglieri Politecnico di Milano, Stefano Crespi Reghizzi Politecnico di Milano and CNR-EIIT, Angelo Morzenti Politecnico di Milano
18:00
2h
Poster
POSTER: Boost Lock-free Queue and Stack with Batching
Main Conference
Ao Li Wuhan University, Wenhai Li Wuhan University, Yuan Chen Wuhan University, Lingfeng Deng Wuhan University
18:00
2h
Poster
POSTER: Frontier-guided Graph Reordering
Main Conference
Xinmiao Zhang SKLP, Institute of Computing Technology, CAS, Cheng Liu ICT CAS, Shengwen Liang SKLP, Institute of Computing Technology, CAS, Chenwei Xiong SKLP, Institute of Computing Technology, CAS, Yu Zhang School of Computer Science and Technology, Huazhong University of Science and Technology, Lei Zhang ICT CAS, Huawei Li SKLP, Institute of Computing Technology, CAS, Xiaowei Li SKLP, Institute of Computing Technology, CAS
18:00
2h
Poster
POSTER: Big Atomics and Fast Concurrent Hash Tables
Main Conference
Daniel Anderson Carnegie Mellon University, Guy E. Blelloch Carnegie Mellon University, USA, Siddhartha Jayanti Google Research
18:00
2h
Poster
POSTER: FastBWA: Practical and Cost-Efficient Genome Sequence Alignment Pipeline
Main Conference
Zhonghai Zhang Institute of Computing Technology, Chinese Academy of Sciences / University of Chinese Academy of Sciences, Yewen Li Institute of Computing Technology, Chinese Academy of Sciences / University of Chinese Academy of Sciences, Ke Meng Chinese Academy of Sciences, Chunming Zhang Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS)
18:00
2h
Poster
POSTER: Transactional Data Structures with Orthogonal Metadata
Main Conference
Yaodong Sheng Lehigh University, Ahmed Hassan Lehigh University, Michael Spear Lehigh University
18:00
2h
Poster
POSTER: High-performance Visual Semantics Compression for AI-Driven Science
Main Conference
Boyuan Zhang Indiana University, Luanzheng Guo Pacific Northwest National Laboratory, Jiannan Tian Indiana University, Jinyang Liu University of California, Riverside, Daoce Wang Indiana University, Fanjiang Ye Indiana University, Chengming Zhang University of Alabama, Jan Strube Pacific Northwest National Laboratory, Nathan R. Tallent Pacific Northwest National Laboratory, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences
18:00
2h
Poster
POSTER: Magneto: Accelerating Parallel Structures in DNNs via Co-Optimization of Operators
Main Conference
Zhanyuan Di State Key Lab of Processors, Institute of Computing Technology, CAS, Leping Wang State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Ziyi Ren State Key Lab of Processors, Institute of Computing Technology, CAS, En Shao State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Jie Zhao Hunan University, Siyuan Feng Shanghai Jiao Tong University, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences
18:00
2h
Poster
POSTER: TENSORMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms
Main Conference
Yucheng Ouyang Institute of Computing Technology, Chinese Academy of Sciences, Xin Chen Institute of Applied Physics and Computational Mathematics, Ying Liu Institute of Computing Technology, Chinese Academy of Sciences, Xin Chen , Honghui Shang Institute of Computing Technology, Chinese Academy of Sciences, Zhenchuan Chen Institute of Computing Technology, Chinese Academy of Sciences, Rongfen Lin National Research Center of Parallel Computer Engineering and Technology, Xingyu Gao Institute of Applied Physics and Computational Mathematics, Lifang Wang Institute of Applied Physics and Computational Mathematics, Fang Li National Research Center of Parallel Computer Engineering and Technology, Jiahao Shan Institute of Computing Technology, Chinese Academy of Sciences, Haifeng Song Institute of Applied Physics and Computational Mathematics, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Xiaobing Feng ICT CAS
18:00 - 20:00
18:00
2h
Other
Reception
Catering

Mon 3 Mar

Displayed time zone: Pacific Time (US & Canada) change

07:30 - 08:30
07:30
60m
Break
Breakfast
Catering

08:30 - 09:30
Keynote at Acacia A&B
09:30 - 10:00
09:30
30m
Coffee break
Break
Catering

10:00 - 11:00
Session 1: Graph Neural Networks (Session Chair: TBA)Main Conference at Acacia D
10:00
20m
Talk
Helios: Efficient Distributed Dynamic Graph Sampling for Online GNN Inference
Main Conference
Jie Sun Zhejiang University, Zuocheng Shi Zhejiang University, Li Su Alibaba Group, Wenting Shen Alibaba Group, Zeke Wang Zhejiang University, Yong Li Alibaba Group, Wenyuan Yu Alibaba Group, Wei Lin Alibaba Group, Fei Wu College of Computer Science and Technology in Zhejiang University, Jingren Zhou Alibaba Group, Bingsheng He National University of Singapore
10:20
20m
Talk
Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering
Main Conference
Jou-An Chen North Carolina State University, Hsin-Hsuan Sung North Carolina State University, Ruifeng Zhang North Carolina State University, Ang Li Pacific Northwest National Laboratory, Xipeng Shen North Carolina State University
10:40
20m
Talk
Adaptive Parallel Training for Graph Neural Networks
Main Conference
Kaihao Ma The Chinese University of Hong Kong, Renjie Liu Southern University of Science and Technology, Xiao Yan Centre for Perceptual and Interactive Intelligence (CPII), Zhenkun Cai Amazon, Xiang Song Amazon Web Services, Minjie Wang Amazon Web Services, Yichao Li The Chinese University of Hong Kong, James Cheng The Chinese University of Hong Kong
11:00 - 11:20
11:00
20m
Coffee break
Break
Catering

11:20 - 12:20
Session 2: GPU I ​(Session Chair: TBA)Main Conference at Acacia D
11:20
20m
Talk
RT–BarnesHut: Accelerating Barnes–Hut Using Ray-Tracing Hardware
Main Conference
Vani Nagarajan Purdue University, Rohan Gangaraju Purdue University, Kirshanthan Sundararajah Virginia Tech, Artem Pelenitsyn Purdue University, Milind Kulkarni Purdue University
11:40
20m
Talk
EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs
Main Conference
Anna Yue University of Minnesota at Twin Cities, Pen-Chung Yew University of Minnesota at Twin Cities, Sanyam Mehta HPE
12:00
20m
Talk
TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs
Main Conference
Shixun Wu University of California, Riverside, Yujia Zhai NVIDIA Corporation, Jinyang Liu University of California, Riverside, Jiajun Huang University of California, Riverside, Zizhe Jian University of California, Riverside, Huangliang Dai University of California, Riverside, Sheng Di Argonne National Laboratory, Franck Cappello Argonne National Laboratory, zizhong chen University of California, Riverside
12:20 - 14:00
12:20
1h40m
Lunch
Lunch
Catering

14:00 - 15:20
Session 3: Concurrent Data Structures and Synchronization I (Session Chair: TBA)Main Conference at Acacia D
14:00
20m
Talk
Reciprocating Locks
Main Conference
Dave Dice Oracle Labs, Alex Kogan Oracle Labs, USA
14:20
20m
Talk
Aggregating Funnels for Faster Fetch&Add and Queues
Main Conference
Younghun Roh MIT, Yuanhao Wei University of British Columbia, Eric Ruppert York University, Panagiota Fatourou FORTH ICS and University of Crete, Greece, Siddhartha Jayanti Google Research, Julian Shun MIT
14:40
20m
Talk
Fairer and More Scalable Reader-Writer Locks by Optimizing Queue Management
Main Conference
Takashi Hoshino Cybozu Labs, Inc., Kenjiro Taura The University of Tokyo
15:00
20m
Talk
Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures
Main Conference
Ajay Singh University of Waterloo, Trevor Brown University of Toronto
15:20 - 15:40
15:20
20m
Coffee break
Break
Catering

15:40 - 16:40
Session 4: Memory (Session Chair: TBA)Main Conference at Acacia D
15:40
20m
Talk
AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations
Main Conference
Fulin Nan Xiamen Univeristy, Zhirong Shen Xiamen University
16:00
20m
Talk
Effectively Virtual Page Prefetching via Spatial-Temporal Patterns for Memory-intensive Cloud Applications
Main Conference
Yun Wang Shanghai Jiao Tong University, Liang Chen , Tianmai Deng Shanghai Jiao Tong University, Ben Luo Alibaba Group, Yibin Shen Alibaba Cloud, Zhixiang Wei Shanghai Jiao Tong University, Yixiao Xu Shanghai Jiao Tong University, Minglang Huang Shanghai Jiao Tong University, Zhengwei Qi Shanghai Jiao Tong University
16:20
20m
Talk
Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion
Main Conference
Hulin Wang , Yaqi Xia Wuhan University, Donglin Yang Nvidia Corporation, Xiaobo Zhou University of Macau, Dazhao Cheng WuHan University
16:40 - 17:00
16:40
20m
Coffee break
Break
Catering

17:00 - 18:00
Session 5: Deep Neural Network​s (Session Chair: TBA)Main Conference at Acacia D
17:00
20m
Talk
FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property
Main Conference
Runxin Zhong Tsinghua University, Yuyang Jin Tsinghua University, Chen Zhang Tsinghua University, Kinman Lei Tsinghua University, Shuangyu Li Tsinghua University, Jidong Zhai Tsinghua University
17:20
20m
Talk
Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism
Main Conference
Weijia Liu Institute of Computing Technology, Chinese Academy of Sciences, Mingzhen Li Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Weile Jia Institute of Computing Technology, Chinese Academy of Sciences
17:40
20m
Talk
COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers
Main Conference
Baixi Sun Indiana University Bloomington, Weijin Liu Stevens Institute of Technology, J. Gregory Pauloski University of Chicago, Jiannan Tian Indiana University, Jinda Jia Indiana University, Daoce Wang Indiana University, Boyuan Zhang Indiana University, Mingkai Zheng Department of Electrical and Computer Engineering at Rutgers University, Sheng Di Argonne National Laboratory, Sian Jin Temple University, Zhao Zhang Peking University, Xiaodong Yu Stevens Institute of Technology, Kamil A. Iskra Argonne National Laboratory, Pete Beckman Northwestern University and Argonne National Laboratory, Guangming Tan Chinese Academy of Sciences(CAS), Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences

Tue 4 Mar

Displayed time zone: Pacific Time (US & Canada) change

08:30 - 09:30
Keynote at Acacia A&B
09:30 - 10:00
09:30
30m
Coffee break
Break
Catering

10:00 - 11:00
Session 6: Large Language Models (Session Chair: TBA)Main Conference at Acacia D
10:00
20m
Talk
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Main Conference
Elias Frantar ISTA, Roberto López Castro Universidade da Coruña, Jiale Chen ISTA, Torsten Hoefler ETH Zurich, Dan Alistarh IST Austria
10:20
20m
Talk
WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training
Main Conference
Junfeng Lin Tsinghua University, Ziming Liu National University of Singapore, Yang You National University of Singapore, Jun Wang CETHIK Group Co. Ltd., Weihao Zhang Lynxi Technologies Co. Ltd, Rong Zhao Tsinghua University
10:40
20m
Talk
ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
Main Conference
Yuhang Liang University of Oregon, Xinyi Li Pacific Northwest National Laboratory(PNNL), Jie Ren William & Mary, Ang Li Pacific Northwest National Laboratory, Bo Fang Pacific Northwest National Laboratory(PNNL), Jieyang Chen University of Oregon
11:00 - 11:20
11:00
20m
Coffee break
Break
Catering

11:20 - 12:20
Session 7: Scheduling and Resource Management (Session Chair: TBA)Main Conference at Acacia D
11:20
20m
Talk
SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs
Main Conference
Yongkang Zhang HKUST, Haoxuan Yu HKUST, Chenxia Han CUHK, Cheng Wang Alibaba Group, Baotong Lu Microsoft Research, Yunzhe Li Shanghai Jiao Tong University, Zhifeng Jiang HKUST, Yang Li China University of Geosciences, Xiaowen Chu Data Science and Analytics Thrust, HKUST(GZ), Huaicheng Li Virginia Tech
11:40
20m
Talk
DORADD: Deterministic Parallel Execution in the Era of Microsecond-Scale Computing
Main Conference
Scofield Liu Imperial College London, Musa Unal EPFL, Matthew J. Parkinson Microsoft Azure Research, Marios Kogias Imperial College London; Microsoft Research
12:00
20m
Talk
WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing
Main Conference
Yankai Jiang Northeastern University, Rohan Basu Roy Northeastern University, Raghavendra Kanakagiri Indian Institute of Technology Tirupati, Devesh Tiwari Northeastern University
12:20 - 14:00
12:20
1h40m
Lunch
Lunch
Catering

14:00 - 15:20
Session 8: Tensor Cores (Session Chair: TBA)Main Conference at Acacia D
14:00
20m
Talk
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores
Main Conference
Jinliang Shi Beijing University of Posts and Telecommunications, Shigang Li Beijing University of Posts and Telecommunications, Youxuan Xu Beijing University of Posts and Telecommunications, Rongtian Fu Beijing University of Posts and Telecommunications, Xueying Wang Beijing University of Posts and Telecommunications, Tong Wu Beijing University of Posts and Telecommunications
14:20
20m
Talk
Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
Main Conference
Haisha Zhao Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Li San Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Jiaheng Wang Renmin University of China, Chunbao Zhou Computer Network Information Center, Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences, Zhikuang Xin Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, lishunde Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, ZhiQiang Liang Computer Network Information Center, Chinese Academy of Sciences, Zhijie Pan Hangzhou Dianzi University, Fang Liu Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Yan Zeng Hangzhou Dianzi University, Yangang Wang Computer Network Information Center, Chinese Academy of Sciences, Xuebin Chi Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences
14:40
20m
Talk
BerryBees: Breadth First Search by Bit-Tensor-Cores
Main Conference
Yuyao Niu Barcelona Supercomputing Center (BSC) - Universitat Politècnica de Catalunya (UPC), Marc Casas Barcelona Supercomputing Center
15:00
20m
Talk
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units
Main Conference
Haozhi Han Microsoft Research; Peking University, Kun Li Microsoft Research, Wei Cui Microsoft Research, Donglin Bai Microsoft Research, Yiwei Zhang UCAS; Microsoft Research, Liang Yuan Chinese Academy of Sciences, Yifeng Cheng Peking University, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research
15:20 - 15:40
15:20
20m
Coffee break
Break
Catering

15:40 - 17:00
Session 9: Concurrent Data Structures and Synchronization II (Session Chair: TBA)Main Conference at Acacia D
15:40
20m
Talk
PANNS: Enhancing Graph-based Approximate Nearest Neighbor Search through Recency-aware Construction and Parameterized Search
Main Conference
Xizhe Yin University of California, Riverside, Chao Gao University of California Riverside, Zhijia Zhao University of California at Riverside, Rajiv Gupta University of California at Riverside (UCR)
16:00
20m
Talk
Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO Queue
Main Conference
Kåre von Geijer Chalmers University of Technology, Philippas Tsigas Chalmers University of Technology, Elias Johansson Chalmers University of Technology, Sebastian Hermansson Chalmers University of Technology
16:20
20m
Talk
LibRTS: A Spatial Indexing Library by Ray Tracing
Main Conference
Liang Geng The Ohio State University, USA, Rubao Lee , Xiaodong Zhang The Ohio State University
16:40
20m
Talk
Crystality: A Programming Model for Smart Contracts on Parallel EVMs
Main Conference
Hao Wang International Digital Economy Academy (IDEA), Shenzhen, China; and Fullnodes Labs, Minghao Pan International Digital Economy Academy (IDEA), Shenzhen, China; and Fullnodes Labs, Jiaping Wang International Digital Economy Academy (IDEA), Shenzhen, China; and Fullnodes Labs

Wed 5 Mar

Displayed time zone: Pacific Time (US & Canada) change

07:30 - 08:30
07:30
60m
Break
Breakfast
Catering

08:30 - 09:30
Keynote at Acacia A&B
09:30 - 10:00
09:30
30m
Coffee break
Break
Catering

10:00 - 11:20
Session 10: GPU II (Session Chair: TBA)Main Conference at Acacia D
10:00
20m
Talk
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra
Main Conference
Julian Bellavita Cornell University, Thomas Pasquali University of Trento, Laura Del Rio University of Trento, Flavio Vella Free University of Bozen, Giulia Guidi Cornell University
10:20
20m
Talk
Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm
Main Conference
Zhibin Wang Nanjing University, Xi Lin Nanjing University, Xue Li Alibaba Group, Pinhuan Wang Rutgers, The State University of New Jersey, Ziheng Meng Nanjing University, Hang Liu Rutgers, The State University of New Jersey, Chen Tian Nanjing University, Sheng Zhong Nanjing University
10:40
20m
Talk
GLUMIN: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining
Main Conference
Weichen Cao Institute of Computing Technology, Chinese Academy of Sciences, Ke Meng Chinese Academy of Sciences, linzhiheng Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS)
11:00
20m
Talk
Improving Tridiagonalization Performance on GPU Architectures
Main Conference
WangHansheng University of Electronic Science and Technology of China, Zhekai Duan University of Edinburgh, Zitian Zhao University of Electronic Science and Technology of China, Siqi Wu University of Electronic Science and Technology of China, Saiqi Zheng Xi'an Jiaotong-Liverpool University, Qiao Li University of Electronic Science and Technology of China, Xu Jiang University of Electronic Science and Technology of China, Shaoshuai Zhang
11:20 - 11:40
11:20
20m
Coffee break
Break
Catering

11:40 - 13:00
Session 11: Parallel Algorithms and Applications (Session Chair: TBA)Main Conference at Acacia D
11:40
20m
Talk
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers
Main Conference
Yiwei Zhang UCAS; Microsoft Research, Kun Li Microsoft Research, Liang Yuan Chinese Academy of Sciences, Haozhi Han Microsoft Research; Peking University, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research
12:00
20m
Talk
Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid
Main Conference
Yi Zong Tsinghua University, Chensong Zhang Academy of Mathematics and Systems Science, Longjiang Mu Laoshan Laboratory, Jianchun Wang China Ship Scientific Research Center, Jian Sun CMA Earth System Modeling and Prediction Center, Xiaowen Xu Institute of Applied Physics and Computational Mathematics, Xinliang Wang Huawei Technologies Co., Ltd, Peinan Yu Tsinghua University, Wei Xue Tsinghua University
12:20
20m
Talk
SBMGT: Scaling Bayesian Multinomial Group Testing
Main Conference
Weicong Chen University of California, Merced, Hao Qi University of California, Merced, Curtis Tatsuoka University of Pittsburgh, Xiaoyi Lu UC Merced
12:40
20m
Talk
An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores
Main Conference
Xiaohui Duan Shandong University, Yi Zhang PIESAT Information Technology,Co. Ltd., Kai Xu Laoshan Laboratory, Haohuan Fu Tsinghua University, Bin Yang Tianjin University, Yiming Wang PIESAT Information Technology,Co. Ltd., Yilun Han Tsinghua University, Siyuan Chen PIESAT Information Technology,Co. Ltd., Zhuangzhuang Zhou National Supercomputing Center in Wuxi, Chenyu Wang National Supercomputing Center in Wuxi, Dongqiang Huang National Supercomputing Center in Wuxi, Huihai An Shandong University, Xiting Ju Tsinghua University, Haopeng Huang Tsinghua University, Zhuang Liu Tsinghua University, Wei Xue Tsinghua, Weiguo Liu Shandong University, Bowen Yan Tsinghua University, Jianye Hou The Chinese University of Hong Kong, Maoxue Yu Laoshan Laboratory, Wenguang Chen Tsinghua University; Pengcheng Laboratory, Jian Li Chinese Academy of Meteorological Sciences, Zhao Jing Laoshan Laboratory, Hailong Liu Laoshan Laboratory, Lixin Wu Laoshan Laboratory

Accepted Papers

Title
AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations
Main Conference
Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering
Main Conference
Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
Main Conference
Adaptive Parallel Training for Graph Neural Networks
Main Conference
Aggregating Funnels for Faster Fetch&Add and Queues
Main Conference
An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores
Main Conference
ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
Main Conference
Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO Queue
Main Conference
BerryBees: Breadth First Search by Bit-Tensor-Cores
Main Conference
COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers
Main Conference
Crystality: A Programming Model for Smart Contracts on Parallel EVMs
Main Conference
DORADD: Deterministic Parallel Execution in the Era of Microsecond-Scale Computing
Main Conference
Effectively Virtual Page Prefetching via Spatial-Temporal Patterns for Memory-intensive Cloud Applications
Main Conference
EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs
Main Conference
Fairer and More Scalable Reader-Writer Locks by Optimizing Queue Management
Main Conference
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units
Main Conference
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores
Main Conference
FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property
Main Conference
GLUMIN: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining
Main Conference
Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion
Main Conference
Helios: Efficient Distributed Dynamic Graph Sampling for Online GNN Inference
Main Conference
Improving Tridiagonalization Performance on GPU Architectures
Main Conference
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers
Main Conference
LibRTS: A Spatial Indexing Library by Ray Tracing
Main Conference
Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism
Main Conference
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Main Conference
PANNS: Enhancing Graph-based Approximate Nearest Neighbor Search through Recency-aware Construction and Parameterized Search
Main Conference
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra
Main Conference
POSTER: A General and Scalable GCN Training Framework on CPU Supercomputers
Main Conference
POSTER: Big Atomics and Fast Concurrent Hash Tables
Main Conference
POSTER: Boost Lock-free Queue and Stack with Batching
Main Conference
POSTER: FastBWA: Practical and Cost-Efficient Genome Sequence Alignment Pipeline
Main Conference
POSTER: Frontier-guided Graph Reordering
Main Conference
POSTER: High-performance Visual Semantics Compression for AI-Driven Science
Main Conference
POSTER: Magneto: Accelerating Parallel Structures in DNNs via Co-Optimization of Operators
Main Conference
POSTER: Minimizing speculation overhead in a parallel recognizer for regular texts
Main Conference
POSTER: TENSORMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms
Main Conference
POSTER: Transactional Data Structures with Orthogonal Metadata
Main Conference
POSTER: Triangle Counting on Tensor Cores
Main Conference
Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures
Main Conference
Reciprocating Locks
Main Conference
RT–BarnesHut: Accelerating Barnes–Hut Using Ray-Tracing Hardware
Main Conference
SBMGT: Scaling Bayesian Multinomial Group Testing
Main Conference
Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid
Main Conference
SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs
Main Conference
Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm
Main Conference
TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs
Main Conference
WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing
Main Conference
WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training
Main Conference

Call for Papers

PPoPP 2025: 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Location: Las Vegas, Nevada, USA (collocated with CC 2025, CGO 2025, and HPCA 2025). Dates: 01 March – 05 March, 2025.

Submission URL: https://ppopp25.hotcrp.com

Important dates:

  • Full paper submission: Friday, August 16, 2024
  • Author response period: Wednesday, October 23 – Friday, October 25, 2024
  • Author notification: Monday, November 11, 2024
  • Artifact submission to AE committee: Monday, November 18, 2024
  • Artifact notification by AE committee: Monday, January 6, 2025
  • Final paper due: Friday, January 10, 2025

Scope:

PPoPP is the premier forum for leading work on all aspects of parallel programming, including theoretical foundations, techniques, languages, compilers, runtime systems, tools, and practical experience. In the context of the symposium, “parallel programming” encompasses work on concurrent and parallel systems (multicore, multi-threaded, heterogeneous, clustered, and distributed systems, grids, accelerators such as ASICs, GPUs, FPGAs, data centers, clouds, large scale machines, and quantum computers). PPoPP is interested in all aspects related to improving the productivity of parallel programming on modern architectures. PPoPP is also interested in work that addresses new parallel workloads and issues that arise out of large-scale scientific or enterprise workloads.

Specific topics of interest include (but are not limited to):

  • Languages, compilers, and runtime systems for parallel programs
  • Concurrent data structures
  • Development, analysis, or management tools
  • Fault tolerance for parallel systems
  • Formal analysis and verification
  • High-performance libraries
  • Middleware for parallel systems
  • Machine learning for parallel systems
  • Parallel algorithms
  • Parallel applications including scientific computing (e.g., simulation and modeling) and enterprise workloads (e.g., web, search, analytics, cloud, and machine learning)
  • Parallel frameworks
  • Parallel programming for deep memory hierarchies including nonvolatile memory
  • Parallel programming theory and models
  • Performance analysis, debugging and optimization
  • Productivity tools for parallel systems
  • Software engineering for parallel programs
  • Synchronization and concurrency control

Papers should report on original research relevant to parallel programming and should contain enough background materials to make them accessible to the entire parallel programming research community. Papers describing experience should indicate how they illustrate general principles or lead to new insights; papers about parallel programming foundations should indicate how they relate to practice. PPoPP submissions will be evaluated based on their technical merit and accessibility. Submissions should clearly motivate the importance of the problem being addressed, compare to the existing body of work on the topic, and explicitly and precisely state the paper’s key contributions and results towards addressing the problem. Submissions should strive to be accessible both to a broad audience and to experts in the area.

Paper Submission:

Conference submission site: https://ppopp25.hotcrp.com

All submissions must be made electronically through the conference website and include an abstract (100–400 words), author contact information, the full list of authors and their affiliations. Full paper submissions must be in PDF format printable on both A4 and US letter-size paper.

All papers must be prepared in ACM Conference Format using the 2-column acmart format: use the SIGPLAN proceedings template acmart-sigplanproc-template.tex for Latex, and interim-layout.docx for Word. You may also want to consult the official ACM information on the Master Article Template and related tools. Important note: The Word template (interim-layout.docx) on the ACM website uses 9pt font; you need to increase it to 10pt.

Papers should contain a maximum of 10 pages of text (in a typeface no smaller than 10pt) or figures, NOT INCLUDING references. There is no page limit for references, and they must include the name of all authors (not {et al.}). Appendices are not allowed, but the authors may submit supplementary material, such as proofs or source code; all supplementary material must be in PDF or ZIP format. Looking at supplementary material is at the discretion of the reviewers.

Submission is double-blind and authors will need to identify any potential conflicts of interest with PC and Extended Review Committee members, as defined here: http://www.sigplan.org/Resources/Policies/Review/ (ACM SIGPLAN policy).

PPoPP 2025 will employ a double-blind reviewing process. To facilitate this process, submissions should not reveal the identity of the authors in any way. Authors should leave out author names and affiliations from the body of their submission. They should also ensure that any references to authors’ own related work should be in the third person (e.g., not “We build on our previous work …” but rather “We build on the work of …”). The purpose of this process is to help the PC and external reviewers come to an initial judgment about the paper without bias, not to make it impossible for them to discover the authors if they were to try. Nothing should be done in the name of anonymity that weakens the submission or makes the job of reviewing the paper more difficult. In particular, important background references should not be omitted or anonymized. In addition, authors should feel free to disseminate their ideas or draft versions of their papers as they normally would. For instance, authors may post drafts of their papers on the web or give talks on their research ideas. Authors with further questions on double-blind reviewing are encouraged to contact the Program Chairs by email.

To facilitate fair and unbiased reviews for all submissions, PPoPP 2025 may utilize the Toronto Paper Matching System (TPMS) to assign papers to reviewers. From the authors’ perspective, this decision means that the submissions may be uploaded to the TPMS.

Submissions should be in PDF and printable on both US Letter and A4 paper. Papers may be resubmitted to the submission site multiple times up until the deadline, but the last version submitted before the deadline will be the version reviewed. Papers that exceed the length requirement, which deviate from the expected format, or that are submitted late will be rejected.

All submissions that are not accepted for regular presentations will be automatically considered for posters. Two-page summaries of accepted posters will be included in the conference proceedings.

To allow reproducibility, we encourage authors of accepted papers to submit their papers for Artifact Evaluation (AE). The AE process begins after the acceptance notification and is run by a separate committee whose task is to assess how the artifacts support the work described in the papers. Artifact evaluation is voluntary and will not affect paper acceptance but will be taken into consideration when selecting papers for awards. Papers that go through the AE process successfully will receive at least one ACM reproducibility badge, printed on the papers themselves. More information will be posted on the AE website.

Deadlines expire at midnight anywhere on earth.

Publication Date:

The titles of all accepted papers are typically announced shortly after the author notification date (late November 2024). Note, however, that this is not the official publication date. The official publication date is the date the proceedings are made available in the ACM Digital Library. ACM will make the proceedings available via the Digital Library, up to 2 weeks prior to the first day of the conference. The official publication date affects the deadline for any patent filings related to published work.

ACM Publications Policies:

By submitting your article to an ACM Publication, you are hereby acknowledging that you and your co-authors are subject to all ACM Publications Policies, including ACM’s new Publications Policy on Research Involving Human Participants and Subjects. Alleged violations of this policy or any ACM Publications Policy will be investigated by ACM and may result in a full retraction of your paper, in addition to other potential penalties, as per ACM Publications Policy." https://www.acm.org/publications/policies/research-involving-human-participants-and-subjects

Please ensure that you and your co-authors obtain an ORCID ID, so you can complete the publishing process for your accepted paper. We are committed to improving author discoverability, ensure proper attribution and contribute to ongoing community efforts around name normalization; your ORCID ID will help in these efforts. Please follow the https://dl.acm.org/journal/pacmcgit/author-guidelines link to see ACM’s ORCID requirements for authors.