Apparently if you copy AdaFactor from fairseq, as recommended by t5 authors, ... T5 is an encoder / decoder model with a language modeling head on top.
確定! 回上一頁