Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, ...
確定! 回上一頁