In my previous post, we derived policy gradients and implemented the REINFORCE algorithm (also known as Monte Carlo policy gradients).
確定! 回上一頁