We find the COCO Captioning dataset [9] not suitable as only an estimated. 2.7% of its captions mention OCR tokens present in the image, and in total.
確定! 回上一頁