gather only applies during distributed training and the result tensor will be the one gathered across processes if gather=True (as a result, the batch size ...
確定! 回上一頁