Thursday, December 13, 2018

Joint Parsing for Understanding 3D Scenes and Human Activities in Videos

(Presenting Fri. 12/14.) We propose a computational framework to jointly parse a single RGB image and reconstruct a holistic 3D configuration composed by a set of CAD models using a stochastic grammar model. Specifically, we introduce a Holistic Scene Grammar (HSG) to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes. Furthermore, as the 3D environment becomes larger and more complex, the complexity of the query-reasoning system grows rapidly. Increasingly, these tasks must happen at line speeds just to keep up with the rate of new data productions, and often real-time processing is needed in order to draw timely inferences. The algorithms currently entail deep learning, dynamic programming, Monte-Carlo iteration, graph analytics, and natural language processing, these are core algorithms massively applied in various AI applications, so we extract these core algorithms and make them be the modules which could be accelerated by emerging in-memory processing technologies from CRISP. Deployments of real-time video analytics will need to do as much processing in the cameras as possible, so will span edge devices to cloud in implementing an end-to-end solution. 
We also have ongoing collaborations to apply our research in the context of various applications and our project will collect a diverse set of applications (especially from task 3.4) into a benchmark suite of challenging applications. we distill key benchmark tasks relevant to every level of the system, and associated QoS metrics, and use these to evaluate the effectiveness of the system designed by our and other labs (under CRISP) and programming environment. A key aspect of developing the benchmark suite is about to develop domain-specific metrics for system efficiency to complement general-purpose QoS metrics (performance, power, etc.) 

Tuesday, December 11, 2018

Perceptual Compression for Video Storage and Processing Systems

(Presenting 12/12)

Compressed videos constitute 70% of Internet traffic, and video upload growth rates far outpace compute and storage improvement trends. Leveraging perceptual cues like saliency, i.e., regions where viewers focus their perceptual attention, can reduce compressed video size while maintaining perceptual quality, but requires significant changes to video codecs and ignores the data management of this perceptual information.

In this talk, we describe Vignette, a new compression technique and storage manager for perception-based video compression. Vignette complements off-the-shelf compression software and hardware codec implementations. Vignette’s compression technique uses a neural network to predict saliency information used during transcoding, and its storage manager integrates perceptual information into the video storage system to support a perceptual compression feedback loop. Vignette’s saliency-based optimizations reduce storage by up to 95% with minimal quality loss, and Vignette videos lead to power savings of 50% on mobile phones during video playback.

Monday, December 10, 2018

HotSpot Extensions for Microchannels

(Trey West, UVA, presenting on Mon 12/10/18.) 
Modern applications present issues that are becoming increasingly difficult to solve with traditional 2D architectures, which has necessitated the need for research into 3D processing and memory architectures. A dominant problem in this field is that 3D architectures produce more heat than 2D architectures, yet are difficult to cool using traditional 2D approaches like heat sinks. New 3D cooling techniques are necessary, and with them, new ways to model these cooling techniques. My research has focused on taking HotSpot, an existing thermal modeling tool capable of modeling 3D architectures, and adding functionality for modeling 3D cooling techniques, with an emphasis on microchannel cooling.

Thursday, December 6, 2018

High-performance In-Memory Data Partitioning

[Presenting on Friday, Dec 7th]

Data partitioning is an important primitive for in-memory data processing systems, and in many cases it is the key performance bottleneck. This important primitive has been the focus of many studies in the past. However, as we argue in this talk, these previous studies have been narrow in their scope leaving many unanswered questions that are of paramount importance in practice. Consequently, to the best of our knowledge, there is no clear answer to the seemingly simple question of what is an efficient partitioning strategy for in-memory data systems. In this talk, we carefully consider this data partitioning primitive in the context of multi-core in-memory data settings. We look at past work in this area and note that many of these studies miss looking at many important aspects such as the impact of the tuple size and the impact of the data formats (e.g. row-store vs. column-store). We build on this initial observation and examine a number of data partitioning strategies, leading to a better understanding of how data partitioning methods perform on modern multi-core large memory systems. We note a few interesting observations, including how relatively simple methods work quite well in practice across a broad spectrum of data parameters. To help future researchers, we propose a partitioning benchmark so that work in this area can take a broader and more realistic perspective when working on data partitioning methods. Overall, the key contribution of this talk is to separate the wheat from the chaff in previous research in this area, analyze the relative performance of various methods on a broad set of data parameters, and help provide a more systematic evaluation framework for future work in this area. We also point to opportunities for new research directions in this area.

Wednesday, December 5, 2018

Integrated Data Transfer and Address Translation for CPU-GPU Environments

(Presenting 12/5)
Increasingly, accelerators such as GPUs, FPGAs, TPUs, etc. are being co-located with the main CPU on server systems to deal with the variable needs across and within workloads. While a lot of prior works have optimized for data movement amongst homogeneous components, the possible inefficiencies of similar data transfers across heterogeneous components has not been explored as much. In this talk, we will present costs due to excessive data transfers that happen in many current CPU-GPU systems, and their poor scalability as we evolve into multi-GPU systems. Since data transfers happen at coarse (page) granularities, and on demand, it results in poor efficiencies, especially with non-useful data being moved as well. We propose (i) compiler based approaches to relayout the data before the movement and (ii) novel address translation mechanisms to handle the consequence of the new data layout. We will present experimental results showing the benefits of such an approach.