... Auto-captions on GIF dataset provided by the Challenge (as pre-training data) and the public MSR-VTT benchmark (as training data for downstream task).
確定! 回上一頁