Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
確定! 回上一頁