apparently the weight_decay in the AdamW function [. ... we demonstrate this is not the case for adaptive gradient algorithms, such as Adam.
確定! 回上一頁