We use the AdamW [16] optimizer for optimization. AdamW employs a different strategy for updating the weights using L2 weight decay parameter, λ.
確定! 回上一頁