We introduce a model for bidirectional retrieval of images and sentences through a multi-modal embedding of visual and natural language data.
確定! 回上一頁