E is a transformer language model. It receives both the text and the image as a single stream of data containing up to 1280 tokens, and is ...
確定! 回上一頁