state = np.reshape(observation, [1, environment_ dimension]) predict = model.predict([state])[0] action = np.argmax(predict) observation, reward, done, ...
確定! 回上一頁