Inspired by Q-irrelevance abstraction, our auxiliary task trains a deep Q-network (DQN) to predict the true Q value distribution over all discrete actions.
確定! 回上一頁