The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the ...
確定! 回上一頁