Mon 3 Mar 2025 16:20 - 16:40 at Acacia D - Session 4: Memory (Session Chair: Dong Li)

The Mixture of Experts (MoE) architecture enhances model quality by scaling up model parameters. However, its development is hindered in distributed training scenarios due to significant communication overhead and expert load imbalance. Existing methods, which only allow for coarse-grained overlapping of communication and computation, slightly alleviate communication costs but at the same time, they introduce a notable impairment of computational efficiency. Furthermore, current approaches to addressing load imbalance often compromise model quality

We introduce CCFuser, a novel framework designed for efficient training of MoE models. CCFuser replaces the costly All2All operations typical in MoE architectures with efficient GPU shared memory access. This allows for the concurrent computation of local and remote data within a fused kernel, achieving substantially higher compute FLOPS for GEMM operations. Additionally, CCFuser addresses load imbalance with a resource-efficient expert reassignment algorithm, which optimizes the use of computational resources in expert reassignment through equivalent graph transformations without sacrificing statistical accuracy. By integrating these optimizations, CCFuser significantly enhances GPU utilization efficiency. Experiments conducted on A100 servers show that CCFuser outperforms state-of-the-art methods like FastMoE and FasterMoE by an average of 2.96x and 2.48x, respectively, achieving a maximum speedup of 4.34x.

Mon 3 Mar

Displayed time zone: Pacific Time (US & Canada) change

15:40 - 16:40
Session 4: Memory (Session Chair: Dong Li)Main Conference at Acacia D
15:40
20m
Talk
AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations
Main Conference
Fulin Nan Xiamen Univeristy, Zhirong Shen Xiamen University
16:00
20m
Talk
Effectively Virtual Page Prefetching via Spatial-Temporal Patterns for Memory-intensive Cloud Applications
Main Conference
Yun Wang Shanghai Jiao Tong University, Liang Chen , Tianmai Deng Shanghai Jiao Tong University, Ben Luo Alibaba Group, Yibin Shen Alibaba Cloud, Zhixiang Wei Shanghai Jiao Tong University, Yixiao Xu Shanghai Jiao Tong University, Minglang Huang Shanghai Jiao Tong University, Zhengwei Qi Shanghai Jiao Tong University
16:20
20m
Talk
Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion
Main Conference
Hulin Wang , Yaqi Xia Wuhan University, Donglin Yang Nvidia Corporation, Xiaobo Zhou University of Macau, Dazhao Cheng WuHan University