针对忆阻神经网络加速器的混合粒度剪枝方法研究

    Research on Mixed-grained Pruning Method for Memristive Neural Network Accelerator

    • 摘要: 消除冗余以减少无效计算是加速神经网络和提高计算效率的常用方法。权重剪枝是一种常用的模型压缩方法,其通过去除冗余权重来有效降低计算成本。然而,现有的非结构化剪枝方法没有考虑阻性随机存取存储器 (Resistive Random Access Memory,RRAM) 的忆阻交叉阵列 (Memristive Crossbar Array,MCA) 结构;而结构化剪枝方法虽然契合MCA结构,但是其过粗的剪枝粒度容易造成网络精度的下降。本文提出了一种混合粒度剪枝方法,有效地降低了基于RRAM的忆阻神经网络加速器的硬件开销。该方法将权重子矩阵列根据冗余程度不同进行分类,并执行不同的剪枝策略,充分利用卷积神经网络 (Convolutional Neural Network,CNN) 的冗余性。与现有方法相比,该方法在压缩比和能量效率方面分别提高了2.0倍和1.6倍,并且精度损失更低。

       

      Abstract: Reducing redundant computations is a common method to accelerate neural networks and improve computational efficiency. The weight pruning is an effective model compression technique by removing redundant weights. However, most existing unstructured pruning methods do not consider the Resistive Random Access Memory (RRAM) crossbar structure of the memristors. On the contrary, the structured pruning methods fit well with the Memristive Crossbar Array (MCA) structure but may lead to a decrease in network accuracy due to the coarser pruning granularity. In this paper, we propose a mixed granularity pruning method that can effectively reduce the hardware overhead of the RRAM-based accelerators. The proposed method classifies the weight sub-matrix columns based on different levels of redundancy, and applies different pruning strategies for different columns, which makes full use of the redundancy of Convolutional Neural Networks (CNNs) . Compared to existing methods, the proposed method achieves compression ratio and energy efficiency improvements of approximately 2.0× and 1.6×, respectively, with less accuracy loss.

       

    /

    返回文章
    返回