Models are trained for 100,000 steps with batch size of 64, AdamW optimizer and linear scheduler with an initial learning rate of 2e−4.
確定! 回上一頁