Tuesday, March 19, 2019

Processing-in-Memory for Energy-efficient NeuralNetwork Training: A Heterogeneous Approach

(Hengyu Zhao Presenting on Wed. 3/20 at 2:00PM ET)


Authors: Hengyu Zhao, Jiawen Liu, Matheus Almeida Ogleari, Dong Li, Jishen Zhao



Abstract: Neural networks (NNs) have been adopted in a widerange of application domains, such as image classification, speechrecognition,   object   detection,   and   computer   vision.   However,training NNs – especially deep neural networks (DNNs) – can beenergy and time consuming, because of frequent data movementbetween  processor  and  memory.  Furthermore,  training  involvesmassive  fine-grained  operations  with  various  computation  andmemory  access  characteristics.  Exploiting  high  parallelism  withsuch  diverse  operations  is  challenging.  To  address  these  chal-lenges,  we  propose  a  software/hardware  co-design  of  heteroge-neous processing-in-memory (PIM) system. Our hardware designincorporates hundreds of fix-function arithmetic units and ARM-based programmable cores on the logic layer of a 3D die-stackedmemory  to  form  a  heterogeneous  PIM  architecture  attached  toCPU.  Our  software  design  offers  a  programming  model  and  aruntime system that program, offload, and schedule various NNtraining  operations  across  compute  resources  provided  by  CPUand heterogeneous PIM. By extending the OpenCL programmingmodel  and  employing  a  hardware  heterogeneity-aware  runtimesystem,  we  enable  high  program  portability  and  easy  programmaintenance  across  various  heterogeneous  hardware,  optimizesystem  energy  efficiency,  and  improve  hardware  utilization.