We show that by drawing multiple samples (predictions) per input (datapoint), we can learn with less data as we freely obtain a REINFORCE baseline.
確定! 回上一頁