Monday, September 26, 2022

An MLIR-based Intermediate Representation for Accelerator Design with Decoupled Customizations

Hongzheng Chen and Niansong Zhang presenting on Wed. 9/28/22.

The increasing specialized accelerators deployed in data centers and edge devices call for the need of generating high-performance accelerators efficiently. However, the custom processing engines, memory hierarchy, data type, and data communication, complicate the accelerator design. In this talk, we will present our MLIR-based accelerator IR HCL, which decouples the algorithm and hardware customizations at the IR level. We will provide several case studies to demonstrate how our IR can support a wide range of applications, how our IR primitives can be composed with different designs, and how we can achieve high performance and productivity at the same time. Finally, we will discuss the benefits and ongoing efforts of our work.

Thursday, September 15, 2022

Accelerating Few Shot Learning with HD Computing in ReRAM

Weihong Xu, UCSD, presenting on Wed. 9/21/22 at 1:00 pm & 7:00 pm ET. 

Hyperdimensional (HD) computing is a lightweight algorithm with fast learning capability that can efficiently realize classification and few-shot learning (FSL) workloads. However, traditional von-Neumann architecture is highly inefficient for HD algorithms due to the limited memory bandwidth and capacity. Processing in-memory (PIM) is an emerging computing paradigm which tries to address these issues by using memories as computing units. Our talk introduces the efficient PIM architecture for HD as well as application of HD on few-shot classification. First, we show the PIM-based HD computing architecture on ReRAM, Tri-HD, to accelerate all phases of the general HD computing pipeline namely, encoding, training, retraining, and inference. Tri-HD is enabled by efficient in-memory logic operations and shows orders of magnitude performance improvements over CPU. However, Tri-HD is not suitable for area and power-constrainted device since it suffers from high HD encoding complexity and complex dataflow. To this end, we present an algorithm and PIM co-design, FSL-HD, to realize energy-efficient FSL. FSL-HD significantly reduces the encoding complexity by >10x and is equipped with an optimized dataflow for practical ReRAM. As a result, FSL-HD shows superior FSL accuracy, flexibility, and hardware efficiency compared to state-of-the-art ReRAM-based FSL accelerators.