The specific model used here is DistilBERT — an offshoot of BERT that is ... model was trained for 3 epochs using a batch size of 16 and learning rate of ...
確定! 回上一頁