... task is more efficient than MLM because the task is defined over all input tokens rather than just the small subset that was masked out.
確定! 回上一頁