Tuesday, April 16, 2019

Tuning Applications for Efficient GPU Offloading to In-memory Processing


(Presenting on Wed. 04/17/2019 at 2:00PM ET)

Authors: Yudong Wu, Mingyao Shen, Yi Hui Chen, Yuanyuan Zhou


Data movement between processors and main memory is a critical bottleneck for data-intensive applications. This problem is more severe with Graphics Processing Units (GPUs) due to their massive parallel data processing capability.  Recent research has shown that Processing-in-Memory (PIM) can greatly alleviate data movement bottleneck by reducing traffic between GPUs and memory devices. It offloads relative small execution context, instead of transferring massive data to be processed between memory devices and processors. However,  conventional application code that is highly optimized for locality to execute efficiently in GPU is not a natural match to be offloaded into PIM.   To address this challenge, our project investigates how application code can be restructured to improve the benefit of PIM offloading from GPUs.   In addition, we also study approaches to dynamically determine how much to offload as well as how to leverage all resources including GPUS in case of offloading to achieve the best possible overall performance.   From our experimental evaluations over 14 applications, our approach can averagely improve application offloading performance by 21%.