Learning Word-Like Units from Joint Audio-Visual Analysis · David Harwath, James Glass. Abstract. Given a collection of images and spoken audio captions, we ...
確定! 回上一頁