(Xiao Liu, UCSD, presenting on July 8, 2020 at 11:00 AM & 7:00 PM ET)
Emerging resistive memory (RRAM) based crossbar is a promising technology to accelerate neural network applications.
Such a structure can support massively parallel multiply-accumulate operations, which are intensively used in convolutional neural networks (CNN).
The novel structure is demonstrated to offer higher performance and power efficiency than the CMOS-based accelerators.
However, previously proposed RRAM-based neural network designs lack several desirable features for neural network accelerators.
First, the pipeline of existing architecture is inefficient.
as data dependency between different layers of the network significantly can stall the execution.
Second, existing RRAM-based accelerators suffer from limited flexibility.
The oversized networks are not able to be executed on the accelerator, while the undersized networks are not able to utilize all the RRAM crossbar arrays.
To address these issues, we propose Mirage, a novel architectural design to enable high parallelism and flexibility for RRAM-based CNN accelerators.
Mirage consists of a Fine-grained Parallel RRAM Architecture (FPRA) and Auto Assignment (AA).
Motivated by thread block design in the GPU, FPRA addresses the data dependency issue in the pipeline.
When inter layer parallelism is involved, FPRA unifies the data dependency of each layer and handles them with shared input and output memory.
AA provides the ability to execute any-sized network on the accelerator.
When the network is oversized, AA utilizes dynamic reconfiguration to fold the network to fit with available hardware.
When the network is undersized, AA utilizes the FPRA to maximize the use of the extra hardware for higher performance.
We evaluate Mirage on seven popular image recognition neural network models with various network sizes.
Such a structure can support massively parallel multiply-accumulate operations, which are intensively used in convolutional neural networks (CNN).
The novel structure is demonstrated to offer higher performance and power efficiency than the CMOS-based accelerators.
However, previously proposed RRAM-based neural network designs lack several desirable features for neural network accelerators.
First, the pipeline of existing architecture is inefficient.
as data dependency between different layers of the network significantly can stall the execution.
Second, existing RRAM-based accelerators suffer from limited flexibility.
The oversized networks are not able to be executed on the accelerator, while the undersized networks are not able to utilize all the RRAM crossbar arrays.
To address these issues, we propose Mirage, a novel architectural design to enable high parallelism and flexibility for RRAM-based CNN accelerators.
Mirage consists of a Fine-grained Parallel RRAM Architecture (FPRA) and Auto Assignment (AA).
Motivated by thread block design in the GPU, FPRA addresses the data dependency issue in the pipeline.
When inter layer parallelism is involved, FPRA unifies the data dependency of each layer and handles them with shared input and output memory.
AA provides the ability to execute any-sized network on the accelerator.
When the network is oversized, AA utilizes dynamic reconfiguration to fold the network to fit with available hardware.
When the network is undersized, AA utilizes the FPRA to maximize the use of the extra hardware for higher performance.
We evaluate Mirage on seven popular image recognition neural network models with various network sizes.
We find that Mirage manages to achieve 2.0x average speedup compares to the state-of-the-art RRAM-based accelerator.
Additionally, Mirage can adopt network into RRAM-based accelerators of various sizes and we show that Mirage can deliver better performance scalability over prior works.