In this paper we speed up Transformer via a fast and lightweight atten- tion model. More specifically, we share attention weights in adjacent layers and enable ...
確定! 回上一頁