... (small) action space • Gradient-free optimization like CEM • Can be used as a guide to improve policy (policy iteration, later); 11.
確定! 回上一頁