The ETC model [3] is another variation in the Sparse Transformer family. It introduces a new global-local attention mechanism.
確定! 回上一頁