apparently the weight_decay in the AdamW function [...] has the same impact as an L2 regularization. This claim isn't entirely correct.
確定! 回上一頁