Long-context deep neural networks (DNNs) have shown significant effectiveness in natural language processing and video applications, etc., becoming a prevailing trend. However, they introduce extremely large intermediate tensors, resulting in substantial memory overhead. While considerable efforts have been made to optimize DNNs, insufficient awareness of tensor properties has hindered effective memory optimization and may lead to inefficient computations in the long context scenario. In this paper, we present TA, a DNN optimization system that reduces memory overhead and improves inference performance by exploiting tensor properties. We first extract and identify essential tensor properties from the computation graph, such as reduce dependency and broadcastability, then apply various optimizations involving transformations and kernel mapping based on these properties. Experiments conducted on seven models demonstrate that TA achieves speedups of 1.50× and 3.24× on average for end-to-end and core module performance, respectively, compared to eight state-of-the-art works, on H100 (1.86× and 3.70× on A100).

Mon 3 Mar

Displayed time zone: Pacific Time (US & Canada) change

17:00 - 18:00
Session 5: Deep Neural Network​s (Session Chair: Wei Niu)Main Conference at Acacia D
17:00
20m
Talk
FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property
Main Conference
Runxin Zhong Tsinghua University, Yuyang Jin Tsinghua University, Chen Zhang Tsinghua University, Kinman Lei Tsinghua University, Shuangyu Li Tsinghua University, Jidong Zhai Tsinghua University
17:20
20m
Talk
Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism
Main Conference
Weijian Liu Institute of Computing Technology, Chinese Academy of Sciences, Mingzhen Li Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Weile Jia Institute of Computing Technology, Chinese Academy of Sciences
17:40
20m
Talk
COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers
Main Conference
Baixi Sun Indiana University Bloomington, Weijin Liu Stevens Institute of Technology, J. Gregory Pauloski University of Chicago, Jiannan Tian Indiana University, Jinda Jia Indiana University, Daoce Wang Indiana University, Boyuan Zhang Indiana University, Mingkai Zheng Department of Electrical and Computer Engineering at Rutgers University, Sheng Di Argonne National Laboratory, Sian Jin Temple University, Zhao Zhang , Xiaodong Yu Stevens Institute of Technology, Kamil A. Iskra Argonne National Laboratory, Pete Beckman Northwestern University and Argonne National Laboratory, Guangming Tan Chinese Academy of Sciences(CAS), Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences