CRISP Blog: Processing-in-Memory for Energy-efficient NeuralNetwork Training: A Heterogeneous Approach

Tuesday, March 19, 2019

Processing-in-Memory for Energy-efficient NeuralNetwork Training: A Heterogeneous Approach

(Hengyu Zhao Presenting on Wed. 3/20 at 2:00PM ET)

Authors: Hengyu Zhao, Jiawen Liu, Matheus Almeida Ogleari, Dong Li, Jishen Zhao

Abstract: Neural networks (NNs) have been adopted in a widerange of application domains, such as image classification, speechrecognition, object detection, and computer vision. However,training NNs – especially deep neural networks (DNNs) – can beenergy and time consuming, because of frequent data movementbetween processor and memory. Furthermore, training involvesmassive fine-grained operations with various computation andmemory access characteristics. Exploiting high parallelism withsuch diverse operations is challenging. To address these chal-lenges, we propose a software/hardware co-design of heteroge-neous processing-in-memory (PIM) system. Our hardware designincorporates hundreds of fix-function arithmetic units and ARM-based programmable cores on the logic layer of a 3D die-stackedmemory to form a heterogeneous PIM architecture attached toCPU. Our software design offers a programming model and aruntime system that program, offload, and schedule various NNtraining operations across compute resources provided by CPUand heterogeneous PIM. By extending the OpenCL programmingmodel and employing a hardware heterogeneity-aware runtimesystem, we enable high program portability and easy programmaintenance across various heterogeneous hardware, optimizesystem energy efficiency, and improve hardware utilization.