In the paper, they used a range of model sizes between 125M and up to 175B (the ... a special end-of-text token and the symbols learned with 50,000 merges.
確定! 回上一頁