Towards the Limits of Energy Efficiency and Performance of Deep Learning Systems
Yanzhi Wang — Syracuse University
Friday, September 29, 2017
ABSTRACT: Deep learning systems have achieved unprecedented progresses in a number of fields such as computer vision, robotics, game playing, unmanned driving and aerial systems, and other AI-related fields. However, the rapidly expanding model size is posing a significant restriction on both the computation and weight storage, for both inference and training, and on both high-performance computing systems and low-power embedded system and IoT applications. In order to overcome these limitations, we propose a holistic framework of incorporating structured matrices into deep learning systems, and could achieve (i) simultaneous reduction on weight storage and computational complexities, (ii) simultaneous speedup of training and inference, and (iii) generality and fundamentality that can be adopted to both software and hardware implementations, different platforms, and different neural network types, sizes, and scalability.
Besides algorithm-level achievements, our framework has (i) a solid theoretical foundation to prove that our approach will converge to the same “effectiveness” as deep learning without compression, and to demonstrate/prove that our approach approach/achieve the theoretical limitation of computation and storage of deep learning systems; (ii) platform-specific implementations and optimizations on smartphones, FPGAs, and ASIC circuits. We demonstrate that our smartphone-based implementation achieves the similar speed of GPU and existing ASIC implementations on the same application. Our FPGA-based implementations for deep learning systems and LSTM networks could achieve 11X+ energy efficiency improvement compared with the best state-of-the-arts, and even higher energy efficiency gain compared with IBM TrueNorth neurosynaptic processor. Our proposed framework can achieve 3.5 TOPS computation performance in FPGAs, and is the first to enable nano-second level recognition speed for image recognition tasks.
BIO: Yanzhi Wang is currently an assistant professor in the Department of Electrical Engineering and Computer Science at Syracuse University, from August 2015. He has received his Ph.D. Degree in Computer Engineering from University of Southern California (USC) in 2014, under supervision of Prof. Massoud Pedram, and his B.S. Degree in Electronic Engineering from Tsinghua University in 2009.
Dr. Wang’s current research interests are the energy-efficient and high-performance implementations of deep learning and artificial intelligence systems, neuromorphic computing and new computing paradigms, and emerging deep learning algorithms/systems such as Bayesian neural networks, generative adversarial networks (GANs), and deep reinforcement learning. Besides, he works on the application of deep learning and machine intelligence in various mobile and IoT systems, medical systems, and UAVs, as well as the integration of security protection in deep learning systems. He also works on near-threshold computing for IoT devices and energy-efficient cyber-physical systems. His group works on both algorithms and actual implementations (FPGAs, circuit tapeouts, mobile and embedded systems, and UAVs).
His work has been published in top venues in conferences and journals (e.g. ASPLOS, MICRO, ICML, DAC, ICCAD, DATE, ASP-DAC, ISLPED, INFOCOM, ICDCS, TComputer, TCAD, etc.), and has been cited for around 3,000 times according to Google Scholar. He has received four Best Paper or Top Paper Awards from major conferences including IEEE ICASSP (top 3 among all 2,000+ submissions), ISLPED, IEEE CLOUD, and ISVLSI. He has another six Best Paper Nominations and two Popular Papers in IEEE TCAD. His group is sponsored by the NSF, DARPA, IARPA, AFRL/AFOSR, Syracuse CASE Center, and industry sources.
Hosted by Paul Bogdan