For example, when using GLP to prune 2 or 4 layers from BERT, the resulting model trained on RTE is not only faster and smaller but also ...
確定! 回上一頁