In general, a less frequent updates of the target network would result in a more stable learning process. For example, Hernandez-Garcia (2019) ...
確定! 回上一頁