It uses off-policy data and the Bellman equation to learn the Q-function, and uses the Q-function to learn the policy. ... HTML · Epub. On Read the Docs: Project ...
確定! 回上一頁