Friday, November 30, 2018

Ultra-Dense, Low Power and Resilient Physical Unclonable Function Based on 3D NAND Flash Memory Array

(Presenting Mon. 12/3) -- 3D NAND flash memory has become an integral part of the cyber-physical systems to cope with the huge data explosion in this era of internet of things (IoT). Moreover, hardware security primitives such as physical unclonable function (PUF) have become indispensable in the functional circuits of these cyber-physical systems for protection against security vulnerabilities and adversary attacks. Therefore, in this talk, we present for the first time a PUF exploiting the intrinsic variability in the string current of the ubiquitous 3D NAND flash memory owing to the process variations and the inherent material imperfections such as grain boundaries and the associated traps. The proposed PUF exhibits excellent performance metrics such as uniformity (50%), diffuseness (50%) and uniqueness (50.08%) and is resilient to the machine learning attacks. The ultra-dense 3D NAND flash memory array also enables a significantly large set of challenge-response pairs (CRPs) for a strong PUF action. 

Thursday, November 29, 2018

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

(Presenting on Friday. 11/30/18)  There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. In this talk, we introduce TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations.

Tuesday, November 27, 2018

HeteroCL: An Intermediate Programming Abstraction for Heterogeneous Computing

(Presenting on Wed. 11/28/18) With the pursuit of better performance under strict physical constraints, there is an increasing need for deploying applications to heterogeneous compute architectures with accelerators. Among these accelerators, intelligent memory and storage (IMS) architectures are proposed to provide an environment where the computation can be as close as possible to the memory cells. This kind of architecture can potentially greatly increase parallelism and energy efficiency, which enables us to run data-intensive applications.

This project aims at developing an intuitive programming model that provides high-level abstractions for programming heterogeneous accelerator architectures, including FPGAs and IMS accelerators.

Thursday, November 15, 2018

A 3D In-memory Architecture for Exploiting Internal Memory Bandwidth in an Area-constrained Logic Layer 

Bulk primitives such as element-wise operations on large vectors, reduction, and scan appear in many applications. The required computation per element is low for such operations. As a result, the cost of moving data from DRAM to a separate processor surpasses the cost of computations and consequently, data movement comprises a significant portion of execution time and energy consumption. To alleviate the cost, prior work has proposed two main approaches: (i) adding logic to row buffers of memory arrays, (ii) adding processing elements to the logic layer of 3D stacked memories. The first approach imposes a significant hardware overhead and is not flexible enough to support all the required operations. Consequently, the second approach seems more practical. Due to the limited area of the logic layer,  processing elements with a traditional architecture that fits in the logic layer cannot provide enough parallelism to consume all the available internal memory bandwidth in the logic layer. The goal of this project is to propose a new architecture which is efficient enough to consume all the available bandwidth and is small enough to fit in the logic layer. 


(Presented on Monday, 11/26/18)

Welcome to our blog!

CRISP researchers and students will share posts here three times per week beginning November 26.