Thompson sampling is an algorithm for online decision prob- lems where ... The agent enjoys a reward rt = r(yt), where r is a known function. The agent is ...
確定! 回上一頁