ICLR 2019 best paper《THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS》提出了彩票假设(lottery ticket hypothesis):“dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolationreach test accuracy comparable to the original network in a similar number of iterations.”
而笔者在[文献阅读] Sparsity in Deep Learning: Pruning and growth for efficient inference and training in NN也(稀烂地)记录了这方面的工作。
本文打算进一步简述一些代表性的工作。另外,按照“when to sparsify”,这方面工作可被分为:Sparsify after training、Sparsify during training、Sparse training,而笔者更为关注后两者(因为是end2end的),所以本文(可能)会更加关注这两个子类别的工作。
步骤:
代码:
步骤:
特点:
代码:
提出了In-Time Over-Parameterization指标:
认为,在可靠的探索的前提下,上述指标越高,也就是模型在训练过程中探索了更多的参数,最终的性能越好。
所以训练步骤应该和《Rigging the Lottery: Making All Tickets Winners》类似,不过具体使用的Sparse Evolutionary Training (SET)——《A. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science》,它的grow是随机的。这样选择是因为:
SET activates new weights in a random fashion which naturally considers all possible parameters to explore. It also helps to avoid the dense over-parameterization bias introduced by the gradient-based methods e.g., The Rigged Lottery (RigL) (Evci et al., 2020a) and Sparse Networks from Scratch (SNFS) (Dettmers & Zettlemoyer, 2019), as the latter utilize dense gradients in the backward pass to explore new weights
特点:
代码:https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterization
做法:
CYCLIC GAP:
PARALLEL GAP:k份,k节点,每一份在不同节点上都是稠密的。
特点:
代码:https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterization