We use fairseq (Ott et al., 2019) to train a small transformer model (Vaswani et al., 2017) with 6 layers of encoder/decoder, 256 model dimensions, ...
確定! 回上一頁