matching and temporal order discrimination, to promote the grounding model training. The cross-modal matching task leverages the content.
確定! 回上一頁