they mask patches of an image and, through an autoencoder predict the masked patches. In the spirit of "masked language modeling", ...
確定! 回上一頁