a performance of 0.719 mAP. We also demonstrate our model's ability to attend to the relevant video parts in or- der to determine the adverb for a given ...
確定! 回上一頁