Tensor ([a for (s1,a,r,s2,d) in minibatch]) reward_batch = torch. ... loss.backward() losses.append(loss.item()) optimizer.step() If the game is over, ...
確定! 回上一頁