The way how it's implemented in Adam came from the good old vanilla SGD optimizers which isn't mathematically correct. AdamW fixes this implementation mistake.
確定! 回上一頁