We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is ...
確定! 回上一頁