It aims to localize objects described in the sentence to visual regions ... we learn the object-relevant association by causal intervention ...
確定! 回上一頁