I found Task using Adam as default optimizer, but afaik both Pytorch and Tensorflow has an wrong implementation of Adam w.r.t weight decay, so AdamW comes ...
確定! 回上一頁