Visual question answering (VQA) models, in particular modular ones, are commonly trained on large-scale datasets to achieve state-of-the-art performance.
確定! 回上一頁