Maybe. But RNNs aren't. Transformers learn "pseudo-temporal" relationships; they lack the true recurrent gradient that RNNs have, ...
確定! 回上一頁