Our verb representation yields more discriminative cues for the final detection task. Secondly, the number of verbs and objects within a sin- gle image is ...
確定! 回上一頁