Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence.
確定! 回上一頁