Abstract:
Reducing redundant computations is a common method to accelerate neural networks and improve computational efficiency. The weight pruning is an effective model compression technique by removing redundant weights. However, most existing unstructured pruning methods do not consider the Resistive Random Access Memory (RRAM) crossbar structure of the memristors. On the contrary, the structured pruning methods fit well with the Memristive Crossbar Array (MCA) structure but may lead to a decrease in network accuracy due to the coarser pruning granularity. In this paper, we propose a mixed granularity pruning method that can effectively reduce the hardware overhead of the RRAM-based accelerators. The proposed method classifies the weight sub-matrix columns based on different levels of redundancy, and applies different pruning strategies for different columns, which makes full use of the redundancy of Convolutional Neural Networks (CNNs) . Compared to existing methods, the proposed method achieves compression ratio and energy efficiency improvements of approximately 2.0× and 1.6×, respectively, with less accuracy loss.