Monday, November 30, 2020

ReTail: Request-Level Latency Prediction in Multicore-Enabled Cloud Servers

(Shuang Chen of Cornell presenting December 2, 2020 at 11:00 AM & 8:00 PM ET)

Latency-critical cloud services, such as websearch, have strict quality-of-service (QoS) constraints in terms of tail latency. Improving energy efficiency usually takes second place in an effort to meet these latency constraints. Per-core Dynamic Voltage and Frequency Scaling (DVFS) can offer significant efficiency benefits, however, it is challenging to determine which requests can afford to employ DVFS without hurting the end-to-end tail latency of the entire service. 


We introduce ReTail, a framework for QoS-aware power management for LC services using request-level latency prediction. ReTail is composed of (1) a general and systematic process to collect and select the features of an application that best correlate with the processing latency of its requests, (2) a simple yet accurate request-level latency predictor using linear regression, and (3) a runtime power management system that meets the QoS constraints of LC applications, while maximizing the server's energy savings. Experimental results show that compared to the best state-of-the-art per-core power manager, ReTail achieves an average of 11% (up to 48%) energy savings, while at the same time meeting QoS.