DistilBERT has 97% of BERT's performance while trained on half of the parameters of BERT. It uses knowledge distillation for reducing size.
確定! 回上一頁