While many models are uni-modal (i.e. only work with one type of data, like text or image or audio), CLIP is multi-modal, meaning it can embed ...
確定! 回上一頁