The difference between the two sides of the equality is known as the temporal difference error, δ \delta δ: δ = Q ( s , a ) − ( r + γ max a Q ( s ′ ...
確定! 回上一頁