We show that the image regions a verb prediction model identifies as salient for a given verb correlate with the regions fixated by human observers performing a ...
確定! 回上一頁