Tuesday, April 21, 2020

BaM: Enabling Accelerator Memory Accesses into the SSD


(Zaid Qureshi, David Min, and Vikram Sharma Mailthody presenting 4/22/20.) 

Storage class memories (SCM) have been considered as a prime candidate to address the growing need for applications’ memory footprint. An ideal SCM for tomorrow’s data center has TBs of memory capacity, has a few hundred nanoseconds to a couple of microsecond latency, is energy efficient, offers high memory parallelism, is scalable and is very cheap. Among the several types of SCM, 3DX-point and Flash have shown promising results.  Compared to 3DX-point, Flash offers higher throughput, thanks to several levels of parallelism, has higher density, consumes very low power per memory access, and is proven to be scalable and cost-efficient.

However, studying Flash as part of the main memory system is challenging as the existing simulators and emulators do not provide the needed flexibility and cannot address practical system-level challenges. In this talk, we will discuss our attempt in modeling an SSD using an FPGA. We show that CPUs offer a very low amount of memory parallelism over PCIe and are inefficient in exploiting the massive parallelism offered by these emerging NVM devices. To increase the memory level parallelism, we connect a GPU with an FPGA. We shall then discuss our learnings and several applications and system-level challenges we encountered.