L2 regularization and weight decay regularization are equivalent for ... is not the case for adaptive gradient algorithms, such as Adam.
確定! 回上一頁