Given the shard of training examples, this function computes the loss for both the masked language modeling and next sentence prediction tasks.
確定! 回上一頁