Shuang Chen from Cornell University
Presenting Wednesday, Apr 10 at 2:00PM EDT
Data centers host latency-critical (LC) as well as best-effort jobs. The former rely critically on an adequate provisioning of hardware resources to meet their QoS target. Many recent industry implementations and research proposals assume a single LC application per node; this is in part to make it easy to carve out resources for LC that one LC application, while allowing best-effort jobs to compete for the rest.
Two big changes in data centers, however, are about to shake up the status quo. First, the micro services model is making the number of LC applications hosted in data centers explode, making it impractical (and inefficient) to assume one LC application per node. That means multiple LC applications competing for resources on a single node, each with their own QoS needs. Second, the arrival of processing-in-memory (PIM) capabilities introduces a complex scheduling challenge (and opportunity).
In this talk, I will present our PIMCloud project, show some initial results, and discuss our ongoing work. First, I will present PARTIES, a novel hardware resource manager that enables successful colocation of multiple LC applications on a single node of a traditional data center. (This work will be presented next week at ASPLOS 2019.) Second, I will discuss how we envision augmenting this framework to accommodate PIM capabilities. Specifically, I will discuss some challenges and opportunities in future nodes where memory channels are themselves compute-capable.