We propose a new deep deterministic actor-critic algorithm ... γi−tr(si,ai), where γ ∈ [0, 1] denotes the discount factor. In RL, the.
確定! 回上一頁