COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers (PPoPP 2025 - Main Conference)

Who

Baixi Sun, Weijin Liu, J. Gregory Pauloski, Jiannan Tian, Jinda Jia, Daoce Wang, Boyuan Zhang, Mingkai Zheng, Sheng Di, Sian Jin, Zhao Zhang, Xiaodong Yu, Kamil A. Iskra, Pete Beckman, Guangming Tan, Dingwen Tao

Track

PPoPP 2025 Main Conference

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 3 Mar 2025 17:40 - 18:00 at Acacia D - Session 5: Deep Neural Networks (Session Chair: Wei Niu)

Abstract

Second-order optimization methods have been developed to enhance convergence and generalization in deep neural network (DNN) training compared to first-order methods like Stochastic Gradient Descent (SGD). However, these methods face challenges in distributed settings due to high communication overhead. Gradient compression, a technique commonly used to accelerate communication for first-order approaches, often results in low communication reduction ratios, decreased model accuracy, and/or high compression overhead when applied to second-order methods. To address these limitations, we introduce a novel gradient compression method for second-order optimizers called COMPSO. This method effectively reduces communication costs while preserving the advantages of second-order optimization. COMPSO employs stochastic rounding to maintain accuracy and filters out minor gradients to improve compression ratios. Additionally, we develop GPU optimizations to minimize compression overhead and performance modeling to ensure end-to-end performance gains across various systems. Evaluation of COMPSO on different DNN models shows that it achieves a compression ratio of 22.1$\times$, reduces communication time by 14.2$\times$, and improves overall performance by 1.8$\times$, all without any drop in model accuracy.

Baixi Sun

Indiana University Bloomington

United States

Weijin Liu

Stevens Institute of Technology

J. Gregory Pauloski

University of Chicago

Jiannan Tian

Indiana University

Jinda Jia

Indiana University

Daoce Wang

Indiana University

Boyuan Zhang

Indiana University

Mingkai Zheng

Department of Electrical and Computer Engineering at Rutgers University

Sheng Di

Argonne National Laboratory

Sian Jin

Temple University

Zhao Zhang

Xiaodong Yu

Stevens Institute of Technology

Kamil A. Iskra

Argonne National Laboratory

Pete Beckman

Northwestern University and Argonne National Laboratory

Guangming Tan

Chinese Academy of Sciences(CAS)

China

Dingwen Tao

Institute of Computing Technology, Chinese Academy of Sciences

China

Time Zone

The program is currently displayed in (GMT-08:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-08:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 3 Mar
Displayed time zone: Pacific Time (US & Canada) change

17:00 - 18:00	Session 5: Deep Neural Networks (Session Chair: Wei Niu)Main Conference at Acacia D

17:00 20m Talk		FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property Main Conference Runxin Zhong Tsinghua University, Yuyang Jin Tsinghua University, Chen Zhang Tsinghua University, Kinman Lei Tsinghua University, Shuangyu Li Tsinghua University, Jidong Zhai Tsinghua University
17:20 20m Talk		Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism Main Conference Weijian Liu Institute of Computing Technology, Chinese Academy of Sciences, Mingzhen Li Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Weile Jia Institute of Computing Technology, Chinese Academy of Sciences
17:40 20m Talk		COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers Main Conference Baixi Sun Indiana University Bloomington, Weijin Liu Stevens Institute of Technology, J. Gregory Pauloski University of Chicago, Jiannan Tian Indiana University, Jinda Jia Indiana University, Daoce Wang Indiana University, Boyuan Zhang Indiana University, Mingkai Zheng Department of Electrical and Computer Engineering at Rutgers University, Sheng Di Argonne National Laboratory, Sian Jin Temple University, Zhao Zhang , Xiaodong Yu Stevens Institute of Technology, Kamil A. Iskra Argonne National Laboratory, Pete Beckman Northwestern University and Argonne National Laboratory, Guangming Tan Chinese Academy of Sciences(CAS), Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences