We encourage researchers to leverage the large amount of noisy video-level labels in the training set to train models for temporal localization.
確定! 回上一頁